# Anomaly Detection (Air Handling Units) - Model Generation

An Air Handling Unit (AHU) is used to regulate and circulate air as part of a heating, ventilating and air-conditioning (HVAC) system. It takes outside air, regulate it, and supplies it as heated or cooled fresh air using heating and cooling coils via supply fan to the building. It is controlled by factors like outside air temperature, inside air temperature and room or floor occupancy. In a Smart Campus environment, there is normally one AHU for each floor of every building. Maintenance of these units is critical for efficient HVAC performance. Malfunctions in AHU is usually caused by interrupted air flow and flagged as supply fan failure.

We will use the [Support Vector Data Description (SVDD)](https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=casactml&docsetTarget=casactml_svdatadescription_details.htm&locale=en) algorithm, packaged in SAS Visual Data Mining and Machine Learning (VDMML), and deploy it using SAS Event Stream Processing ESPPy module.The CAS session is used to invoke Deep Learning to train a new model. to detect outliers in real time using streaming data.

Additional resources for this use case can be found at the SAS GitHhub page for [Anomaly Detection in Air Handling Units](https://github.com/sassoftware/iot-anomaly-detection-hvac).

### 0. Setup the Environment

First, import the necessary packages to run this notebook.

In [50]:
import os
import pandas as pd
import swat
import getpass

### 1. Start a SAS Viya CAS Session

The CAS session is used to invoke SAS Visual Data Mining and Machine Learning (VDMML) to train a new model.

In [51]:
os.environ["CAS_CLIENT_SSL_CA_LIST"] = "/opt/sas/viya/config/etc/SASSecurityCertificateFramework/cacerts/trustedcerts.pem"
password=getpass.getpass();

cashost='frasepviya35smp'
casport=5570
sess = swat.CAS(cashost, casport,'viyademo01',password)

 ······


### 2. Data Preparation

Define the data to be used during model generation and import the table action set.

In [52]:
indata_dir="../data"
indata1="ahu_train"
indata2="ahu_scr"

sess.loadactionset(actionset="table")

NOTE: Added action set 'table'.


### 3. Load data into CAS

The data is captured every fifteen minutes from two AHUs, over a span of six months. It is split into 1.) training data - indicating normal operating condition collected for a duration of 1.5 months and 2.) scoring data - which includes anomalous behavior collected over 4 months. The training and scoring data sets consists of sensor values such as mixed air temperature, return air temperature, chilled water valve status, duct pressure, supply fan speed, etc.

In [53]:
if not sess.table.tableExists(table=indata1).exists:
    tbl = sess.upload_file(indata_dir+"/"+indata1+".csv", casout={"name":indata1})

NOTE: Cloud Analytic Services made the uploaded file available as table AHU_TRAIN in caslib CASUSER(viyademo01).
NOTE: The table AHU_TRAIN has been created in caslib CASUSER(viyademo01) from binary data uploaded to Cloud Analytic Services.


In [54]:
if not sess.table.tableExists(table=indata2).exists:
    tbl = sess.upload_file(indata_dir+"/"+indata2+".csv", casout={"name":indata2})

NOTE: Cloud Analytic Services made the uploaded file available as table AHU_SCR in caslib CASUSER(viyademo01).
NOTE: The table AHU_SCR has been created in caslib CASUSER(viyademo01) from binary data uploaded to Cloud Analytic Services.


Get Table Information

In [55]:
sess.tableinfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,...,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,AHU_TRAIN,21546,11,0,utf-8,2021-03-08T10:26:10+01:00,2021-03-08T10:26:10+01:00,2021-03-08T10:26:10+01:00,UTF8,1930815000.0,...,0,0,0,,,0,viyademo01,,2021-03-08T10:26:10+01:00,1930815000.0
1,AHU_SCR,20032,11,0,utf-8,2021-03-08T10:26:12+01:00,2021-03-08T10:26:12+01:00,2021-03-08T10:26:12+01:00,UTF8,1930815000.0,...,0,0,0,,,0,viyademo01,,2021-03-08T10:26:12+01:00,1930815000.0


### 4. Import SVDD action set

In [56]:
sess.loadactionset('svdd')

NOTE: Added action set 'svdd'.


### 5. Train SVDD Model

SVDD algorithm is a one-class classification technique that is useful in applications where data that belongs to one class is abundant, but data about any other class is scarce or missing. Fraud detection, equipment health monitoring, and process control are some examples of application areas where the majority of the data belong to one class.

In its simplest form, an SVDD model is obtained by building a minimum-radius hypersphere around the one-class training data. The hypersphere provides a compact spherical description of the training data. This training data description can be used to determine whether a new observation is similar to the training data observations. The distance from any new observation to the hypersphere center is computed and compared with the hypersphere radius. If the distance is more than the radius, the observation is designated as an outlier. Using kernel functions in SVDD formulation provides a more flexible description of training data. Such description is nonspherical and conforms to the geometry of the data. PROC SVDD implements only the flexible data description.

We are using SVDD based K-charts to determine anomalous behavior in AHUs. K-chart is a nonparametric multivariate control chart that is used for statistical process control and can also be used for monitoring equipment health and operating data. It is implemented in two phases: In phase 1, observations from normal operations of the process are collected and are used to train a SVDD model and obtain the threshold r-square value.

In [57]:
# Phase 1: Model Training 
# Run svDataDescription.svddTrain action set on Training data 
sess.svDataDescription.svddTrain(bw=94,
                         solver='actset',
                         inputs=[{"name":"SUPPL_FAN_SP"},{"name":"DIS_AIR_TEMP"},{"name":"DUCT_PRESS_ACTV"},{"name":"MIXED_AIR_TEMP"},
                                     {"name":"RTRN_AIR_TEMP"},{"name":"MAX_CO2_VAL"},{"name":"CHW_VALVE"},{"name":"CHW_VALVE_POSIT"}],
                         id=[ "AHU"] ,     
                         savestate = {"name": "svdd_ahu", "replace":True},
                         output={"casout":{"name":"sv","replace":True}},
                         table={"caslib":"casuser", "name":"ahu_train"}
                                )

NOTE: Using Active Set Solver.
NOTE: SVDDTRAIN runs with default maxtime of 1800 seconds.
NOTE: Beginning data reading...
NOTE: Data reading complete.
NOTE: Starting processing for Model 1 (with bandwidth=94).
NOTE: Ending processing for Model 1 (with bandwidth=94).
NOTE: Output generation complete.
NOTE: Support vector table generation complete.
NOTE: Beginning save state generation...
NOTE: 9709 bytes were written to the table "svdd_ahu" in the caslib "CASUSER(viyademo01)".
NOTE: Save state generation complete.


Unnamed: 0,RowId,Type,N
0,NREAD,Number of Observations Read,21546.0
1,NUSED,Number of Observations Used,21546.0

Unnamed: 0,RowId,Description,Value,nValue
0,OPTMETHOD,Optimization Method,Active Set,
1,KERTYPE,Kernel Type,RBF,
2,BW,RBF Kernel Bandwidth,94,94.0
3,TUNE,Bandwidth Selection Method,User specified,
4,RELSCALE,Bandwidth Relative Scale,1.1417201508,1.14172
5,FRAC,Expected Outlier Fraction,1E-6,1e-06
6,OPTTOL,Optimization Tolerance,0.0001,0.0001
7,NINTVARS,Number of Interval Variables,8,8.0
8,NNOMVARS,Number of Nominal Variables,0,0.0

Unnamed: 0,RowId,Description,Value
0,NSV,Number of Support Vectors,35.0
1,NSVB,Number of Support Vectors on Boundary,35.0
2,NDROBS,Number of Dropped Observations,0.0
3,THRESH,Threshold R Square Value,0.906312
4,C_R,Constant (C_r) Value,0.093688
5,RTIME,Run Time (seconds),0.448475

Unnamed: 0,RowId,Description,Value,nValue
0,NITERS,Number of Iterations,1,1.0
1,OBJ,Objective Value,0.0936875946,0.093688
2,INFEA,Infeasibility,0.0000688954,6.9e-05
3,OPTSTATUS,Optimization Status,Optimal,
4,DEGEN,Degenerate,No,0.0

Unnamed: 0,RowId,Model,Status
0,Model_1,Model 1,Success

Unnamed: 0,casLib,Name,Label,Rows,Columns,casTable
0,CASUSER(viyademo01),sv,,35,11,"CASTable('sv', caslib='CASUSER(viyademo01)')"


### 6. Score SVDD Model using ASTORE

This model of normal operations is then operationalized in phase 2 for anomaly detection. For each new observation, its distance value is computed and compared to the threshold r-square value. Observations for which distance value > threshold r-square generally indicate something abnormal in the process.

Load astore action set

In [58]:
sess.loadactionset('astore')

NOTE: Added action set 'astore'.


In [59]:
sess.score(
     table='ahu_scr',
     out='ahu_scored',
     rstore='svdd_ahu'
    )

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(viyademo01),ahu_scored,20032,3,"CASTable('ahu_scored', caslib='CASUSER(viyadem..."

Unnamed: 0,Task,Seconds,Percent
0,Loading the Store,0.000158,0.008958
1,Creating the State,0.004228,0.239596
2,Scoring,0.01287,0.729341
3,Total,0.017646,1.0


### 7. Generate an Analytic Store File

This ASTORE file can then be used in ESPPy for real-time anomaly detection.

In [11]:
store=sess.download(rstore='svdd_ahu')
with open('/user/my_code/ahu_svdd.astore','wb') as file:
   file.write(store['blob'])

NOTE: 9709 bytes were downloaded from the table "SVDD_AHU" in the caslib "CASUSER(u172762)".


In [12]:
sess.astore.describe(rstore='svdd_ahu')

Unnamed: 0,Key
0,11C2D6ACDFD126F342F1348FCA87C78415390AC2

Unnamed: 0,Attribute,Value
0,Analytic Engine,svdd
1,Time Created,22Oct2020:17:57:02

Unnamed: 0,Name,Length,Role,Type,RawType,FormatName
0,SUPPL_FAN_SP,8.0,Input,Interval,Num,
1,DIS_AIR_TEMP,8.0,Input,Interval,Num,
2,DUCT_PRESS_ACTV,8.0,Input,Interval,Num,
3,MIXED_AIR_TEMP,8.0,Input,Interval,Num,
4,RTRN_AIR_TEMP,8.0,Input,Interval,Num,
5,MAX_CO2_VAL,8.0,Input,Interval,Num,
6,CHW_VALVE,8.0,Input,Interval,Num,
7,CHW_VALVE_POSIT,8.0,Input,Interval,Num,
8,AHU,7.0,Id,,Character,

Unnamed: 0,Name,Length,Type,Label
0,AHU,7.0,Character,
1,_SVDDDISTANCE_,8.0,Num,SVDD Distance
2,_SVDDSCORE_,8.0,Num,SVDD Score


In [60]:
sess.tableinfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,...,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,AHU_TRAIN,21546,11,0,utf-8,2021-03-08T10:26:10+01:00,2021-03-08T10:26:10+01:00,2021-03-08T10:26:22+01:00,UTF8,1930815000.0,...,0,0,0,,,0,viyademo01,,2021-03-08T10:26:10+01:00,1930815000.0
1,AHU_SCR,20032,11,0,utf-8,2021-03-08T10:26:12+01:00,2021-03-08T10:26:12+01:00,2021-03-08T10:26:29+01:00,UTF8,1930815000.0,...,0,0,0,,,0,viyademo01,,2021-03-08T10:26:12+01:00,1930815000.0
2,SV,35,11,0,utf-8,2021-03-08T10:26:22+01:00,2021-03-08T10:26:22+01:00,2021-03-08T10:26:22+01:00,UTF8,1930815000.0,...,0,0,0,,,0,viyademo01,,,
3,SVDD_AHU,1,2,0,utf-8,2021-03-08T10:26:22+01:00,2021-03-08T10:26:22+01:00,2021-03-08T10:26:29+01:00,UTF8,1930815000.0,...,0,0,0,,,0,viyademo01,,,
4,AHU_SCORED,20032,3,0,utf-8,2021-03-08T10:26:29+01:00,2021-03-08T10:26:29+01:00,2021-03-08T10:26:29+01:00,UTF8,1930815000.0,...,0,0,0,,,0,viyademo01,,,


### 8. Register model in model respository

Terminate the CAS session.

In [62]:
from sasctl import Session
from sasctl.tasks import register_model
from sasctl.services import model_repository,projects

astore = sess.CASTable('svdd_ahu')

with Session('http://frasepviya35smp', 'viyademo01', password):
   try:
      existingProj=model_repository.get_project('HVAC Outlier detection project')
      model_repository.delete_project(existingProj)
   finally:
     model_repository.create_project('HVAC Outlier detection project','Public')
     register_model(astore, 'SVDD HVAC Outlier detection', 'HVAC Outlier detection project')

NOTE: Added action set 'astore'.
NOTE: Cloud Analytic Services saved the file _1EFB2DC302E34B59A6A25269E.sashdat in caslib ModelStore.


### 9. Cleanup Your Project

Terminate the CAS session.

In [63]:
sess.close()