# Anomaly Detection (Trades) - Model Generation

We will use the [Support Vector Data Description (SVDD)](https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=casactml&docsetTarget=casactml_svdatadescription_details.htm&locale=en) algorithm, packaged in SAS Visual Data Mining and Machine Learning (VDMML), and deploy it using SAS Event Stream Processing ESPPy module.The CAS session is used to invoke Deep Learning to train a new model. to detect outliers in real time using streaming data.


### 0. Setup the Environment

First, import the necessary packages to run this notebook.

In [1]:
import os
import pandas as pd
import swat
import getpass

### 1. Start a SAS Viya CAS Session

The CAS session is used to invoke SAS Visual Data Mining and Machine Learning (VDMML) to train a new model.

In [2]:
os.environ["CAS_CLIENT_SSL_CA_LIST"] = "/opt/sas/viya/config/etc/SASSecurityCertificateFramework/cacerts/trustedcerts.pem"
password=getpass.getpass();

cashost='frasepviya35smp.cloud.com'
casport=5570
sess = swat.CAS(cashost, casport,'viyademo01',password)


sess.loadactionset(actionset="table")

 ······


NOTE: Added action set 'table'.


### 2. Data Preparation

Define the data to be used during model generation

In [3]:
# Name the existing CAS table to be used
indata = 'STREAMTOTALCOST_TRAIN'

if not sess.table.tableExists(name=indata,caslib="public").exists:
    tbl = sess.table.loadTable(caslib="public", path=indata+".sashdat", casout={"name":indata,"caslib":"public", "replace":True})

castbl = sess.CASTable(name=indata, caslib="public")

In [4]:
sess.tableinfo(caslib="public")

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,STREAMTOTALCOST_TRAIN,380400,9,0,utf-8,2021-10-01T14:12:50+00:00,2021-10-01T14:12:50+00:00,2021-10-01T15:17:47+00:00,UTF8,1948717000.0,0,0,0,STREAMTOTALCOST_TRAIN.sashdat,Public,0,viyademo01,,2021-10-01T14:12:48+00:00,1948717000.0


In [5]:
castbl.describe()
list(castbl.columns)

['_opcode',
 'tradeID*',
 'security',
 'quantity',
 'price',
 'totalCost',
 'traderID',
 'time',
 'name']

### 4. Import SVDD action set

In [6]:
sess.loadactionset('svdd')

NOTE: Added action set 'svdd'.


### 5. Train SVDD Model

SVDD algorithm is a one-class classification technique that is useful in applications where data that belongs to one class is abundant, but data about any other class is scarce or missing. Fraud detection, equipment health monitoring, and process control are some examples of application areas where the majority of the data belong to one class.

In its simplest form, an SVDD model is obtained by building a minimum-radius hypersphere around the one-class training data. The hypersphere provides a compact spherical description of the training data. This training data description can be used to determine whether a new observation is similar to the training data observations. The distance from any new observation to the hypersphere center is computed and compared with the hypersphere radius. If the distance is more than the radius, the observation is designated as an outlier. Using kernel functions in SVDD formulation provides a more flexible description of training data. Such description is nonspherical and conforms to the geometry of the data. PROC SVDD implements only the flexible data description.

We are using SVDD based K-charts to determine anomalous behavior in AHUs. K-chart is a nonparametric multivariate control chart that is used for statistical process control and can also be used for monitoring equipment health and operating data. It is implemented in two phases: In phase 1, observations from normal operations of the process are collected and are used to train a SVDD model and obtain the threshold r-square value.

In [7]:
# Phase 1: Model Training 
# Run svDataDescription.svddTrain action set on Training data 
sess.svDataDescription.svddTrain(tunemethod="MEAN2",
                         solver='actset',
                         inputs=[{"name":"totalcost"}],
                         id=['_opcode','tradeID*','security','quantity','price','totalCost','traderID','time','name'],
                         savestate = {"name": "trade_outliers_svdd", "replace":True},
                         table={"caslib":"public","name":"STREAMTOTALCOST_TRAIN"})

NOTE: Using Active Set Solver.
NOTE: SVDDTRAIN runs with default maxtime of 1800 seconds.
NOTE: Beginning data reading...
NOTE: Data reading complete.
NOTE: Starting processing for Model 1 (with bandwidth=74382.414603).
NOTE: Ending processing for Model 1 (with bandwidth=74382.414603).
NOTE: Output generation complete.
NOTE: Beginning save state generation...
NOTE: 6717 bytes were written to the table "trade_outliers_svdd" in the caslib "CASUSER(viyademo01)".
NOTE: Save state generation complete.


Unnamed: 0,RowId,Type,N
0,NREAD,Number of Observations Read,380400.0
1,NUSED,Number of Observations Used,380400.0

Unnamed: 0,RowId,Description,Value,nValue
0,OPTMETHOD,Optimization Method,Active Set,
1,KERTYPE,Kernel Type,RBF,
2,BW,RBF Kernel Bandwidth,74382.414603,74382.414603
3,TUNE,Bandwidth Selection Method,Modified Mean,
4,RELSCALE,Bandwidth Relative Scale,1.3521105102,1.352111
5,FRAC,Expected Outlier Fraction,1E-6,1e-06
6,OPTTOL,Optimization Tolerance,0.0001,0.0001
7,NINTVARS,Number of Interval Variables,1,1.0
8,NNOMVARS,Number of Nominal Variables,0,0.0

Unnamed: 0,RowId,Description,Value
0,NSV,Number of Support Vectors,13.0
1,NSVB,Number of Support Vectors on Boundary,13.0
2,NDROBS,Number of Dropped Observations,0.0
3,THRESH,Threshold R Square Value,0.83819
4,C_R,Constant (C_r) Value,0.16181
5,RTIME,Run Time (seconds),0.685958
6,BCAL,Bandwidth Calculation Time (seconds),0.000199

Unnamed: 0,RowId,Description,Value,nValue
0,NITERS,Number of Iterations,1,1.0
1,OBJ,Objective Value,0.1618096478,0.16181
2,INFEA,Infeasibility,0.0000803413,8e-05
3,OPTSTATUS,Optimization Status,Optimal,
4,DEGEN,Degenerate,No,0.0

Unnamed: 0,RowId,Model,Status
0,Model_1,Model 1,Success


### 6. Score SVDD Model using ASTORE

This model of normal operations is then operationalized in phase 2 for anomaly detection. For each new observation, its distance value is computed and compared to the threshold r-square value. Observations for which distance value > threshold r-square generally indicate something abnormal in the process.

Load astore action set

In [8]:
sess.loadactionset('astore')

NOTE: Added action set 'astore'.


In [9]:
sess.score(
     table={"name":'STREAMTOTALCOST_TRAIN', "caslib":'public'},
     casout='STREAMTOTALCOST_TRAIN_scored',
     rstore='trade_outliers_svdd'
    )

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(viyademo01),STREAMTOTALCOST_TRAIN_scored,380400,11,"CASTable('STREAMTOTALCOST_TRAIN_scored', casli..."

Unnamed: 0,Task,Seconds,Percent
0,Loading the Store,9.8e-05,0.001161
1,Creating the State,0.002356,0.027906
2,Scoring,0.081757,0.968457
3,Total,0.08442,1.0


### 7. Generate an Analytic Store File

This ASTORE file can then be used in ESPPy for real-time anomaly detection.

In [181]:
#store=sess.download(rstore='svdd_ahu')
#with open('../ahu_svdd.astore','wb') as file:
#   file.write(store['blob'])

In [12]:
#sess.astore.describe(rstore='svdd_ahu')

In [13]:
sess.tableinfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,TRADE_OUTLIERS_SVDD,1,2,0,utf-8,2021-10-01T15:22:44+00:00,2021-10-01T15:22:44+00:00,2021-10-01T15:22:53+00:00,UTF8,1948721000.0,0,0,0,,,0,viyademo01,,,
1,STREAMTOTALCOST_TRAIN_SCORED,380400,11,0,utf-8,2021-10-01T15:22:53+00:00,2021-10-01T15:22:53+00:00,2021-10-01T15:22:53+00:00,UTF8,1948721000.0,0,0,0,,,0,viyademo01,,,


In [14]:
castbls = sess.CASTable("STREAMTOTALCOST_TRAIN_scored")
castbls.head(10)

Unnamed: 0,_opcode,tradeID*,security,quantity,price,totalCost,traderID,time,name,_SVDDDISTANCE_,_SVDDSCORE_
0,ins,197416,SAP,3139,124.295,390162.005,10005,2020-07-20 10:40:25.213615,Jane Bloggs,0.838199,1.0
1,ins,197417,TPC,3841,113.691,436687.131,10003,2020-07-20 10:40:25.213652,Jane Doe,0.838143,-1.0
2,ins,197418,LPX,3291,131.876,434003.916,10005,2020-07-20 10:40:25.213845,Jane Bloggs,0.838139,-1.0
3,ins,197419,TPC,1826,13.688,24994.288,10003,2020-07-20 10:40:25.213940,Jane Doe,0.824121,-1.0
4,ins,197420,TPC,4116,127.336,524114.976,10005,2020-07-20 10:40:25.214141,Jane Bloggs,0.83826,1.0
5,ins,197421,IBM,3902,162.19,632865.38,10005,2020-07-20 10:40:25.214171,Jane Bloggs,0.838336,1.0
6,ins,197422,AOL,2437,89.4593,218012.3141,10002,2020-07-20 10:40:25.214355,John Doe,0.837747,-1.0
7,ins,197423,XOM,1887,62.1869,117346.6803,10002,2020-07-20 10:40:25.214379,John Doe,0.838016,-1.0
8,ins,197424,AOL,1337,187.957,251298.509,10003,2020-07-20 10:40:25.214657,Jane Doe,0.838261,1.0
9,ins,197425,AOL,4788,7.64216,36590.66208,10003,2020-07-20 10:40:25.214676,Jane Doe,0.822614,-1.0


### 8. Register model in model respository

Terminate the CAS session.

In [15]:
from sasctl import Session
from sasctl.tasks import register_model
from sasctl.services import model_repository,projects

astore = sess.CASTable('trade_outliers_svdd')

with Session('http://frasepviya35smp', 'viyademo01', password):
   try:
      existingProj=model_repository.get_project('Trade_ML002')
      model_repository.delete_project(existingProj)
   finally:
     model_repository.create_project('Trade_ML002','Public')
     register_model(astore, 'outlier_svdd002', 'Trade_ML002')

NOTE: Added action set 'astore'.
NOTE: Cloud Analytic Services saved the file _2C2F210331924BACB84D50102.sashdat in caslib ModelStore.


### 9. Cleanup Your Project

Terminate the CAS session.

In [16]:
sess.close()