# ITOps Analytics

##  UC: Incident Root Cause Analysis 

Incident Reports in ITOps usually states the symptoms. Identifying the root cause of the symptom quickly is a key determinant to reducing resolution times and improving user satisfaction. This is a sample use case to demonstrate ML/DL capability based solution using sample data. 

## 1. Get desired libraries


In [118]:
#Install all related packages. If you find additional packages missing, please follow the same technique.

import sys # For using system function variables
import os # For using OS related functions
#!conda install --yes --prefix {sys.prefix} pandas tensorflow scikit-learn --> Execute this only in case your desired packages are not installed

from platform import python_version #Check for Python version installed on the environment
print("Python Version: " + python_version())

import tensorflow as tf #Check for tensorflow version installed on the environment
print("Tensorflow Version: " + tf.__version__)

Python Version: 3.7.7
Tensorflow Version: 2.1.0


In [120]:
# Check for the directory path
cwd = os.getcwd()

cwd

## 2. Preprocessing Incident Data

### Loading the Dataset

In [121]:
import pandas as pd
#import os
#import tensorflow as tf

#Load the data file into a Pandas Dataframe
symptom_data = pd.read_csv("root_cause_analysis.csv")

#Explore the data loaded
print(symptom_data.dtypes)
symptom_data.head()

ID              int64
CPU_LOAD        int64
MEMORY_LOAD     int64
DELAY           int64
ERROR_1000      int64
ERROR_1001      int64
ERROR_1002      int64
ERROR_1003      int64
ROOT_CAUSE     object
dtype: object


Unnamed: 0,ID,CPU_LOAD,MEMORY_LOAD,DELAY,ERROR_1000,ERROR_1001,ERROR_1002,ERROR_1003,ROOT_CAUSE
0,1,0,0,0,0,1,0,1,MEMORY
1,2,0,0,0,0,0,0,1,MEMORY
2,3,0,1,1,0,0,1,1,MEMORY
3,4,0,1,0,1,1,0,1,MEMORY
4,5,1,1,0,1,0,1,0,NETWORK_DELAY


In [122]:
symptom_data.tail(10)

Unnamed: 0,ID,CPU_LOAD,MEMORY_LOAD,DELAY,ERROR_1000,ERROR_1001,ERROR_1002,ERROR_1003,ROOT_CAUSE
990,991,1,1,0,1,1,1,0,DATABASE_ISSUE
991,992,1,0,0,1,1,1,0,DATABASE_ISSUE
992,993,0,0,0,0,0,0,1,DATABASE_ISSUE
993,994,1,1,0,0,1,0,1,MEMORY
994,995,0,1,1,0,1,0,0,MEMORY
995,996,0,0,0,0,0,0,1,DATABASE_ISSUE
996,997,0,0,0,1,0,0,0,NETWORK_DELAY
997,998,1,1,1,0,0,0,0,MEMORY
998,999,0,1,1,1,1,0,0,NETWORK_DELAY
999,1000,1,0,0,0,1,1,0,DATABASE_ISSUE


#### So, if you look at above sample dataset above, input features and target are as follows:

Input Features: ID, CPU_LOAD, MEMORY_LOAD, DELAY, ERROR_1000, ERROR_1001, ERROR_1002, ERROR_1003

Target: ROOT_CAUSE
    
Problem Type: Multi class classification

Obviously, one can use any dataset. Intent here is to get a feel of the example.

### Convert  data

Input data needs to be converted to formats that can be consumed by ML/DL algorithms

In [123]:
# We should convert data to formats that can be consumed by Keras as Keras only consumes NumPy arrays

from sklearn import preprocessing

# ROOT_CAUSE column is a text attribute. Hence we need to convert it into a numeric value
# We have used label_encoder from scikit learn to transform the ROOT_CAUSE into a numeric value.
# Since ROOT_CAUSE is our TARGET (i.e y value), it should not be used for input features such as x1, x2, x3 etc
# Ref: we can check here for more details for our learning around LabelEncoder() - https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
label_encoder = preprocessing.LabelEncoder()
symptom_data['ROOT_CAUSE'] = label_encoder.fit_transform(symptom_data['ROOT_CAUSE'])

# Convert Pandas DataFrame to a numpy vector array using to_numpy() function
np_symptom = symptom_data.to_numpy().astype(float)

# Separate training attributes X_train into X_train array
# Extract the feature variables (X)
X_train = np_symptom[:,1:8]

#We need to extract the target variable (Y) in Y_train
Y_train=np_symptom[:,8]
#Then we need to use the one-hot-encoding for this categorical variable for it to be consumed by Keras.
#Here we are using utils.to_categorical function within tf.keras
Y_train = tf.keras.utils.to_categorical(Y_train,3)

print("Shape of feature variables :", X_train.shape)
print("Shape of target variable :",Y_train.shape)

Shape of feature variables : (1000, 7)
Shape of target variable : (1000, 3)


## 3. Building the Model with Keras

In [124]:
# Get required libraries for building model with Keras
from tensorflow import keras
from tensorflow.keras import optimizers
from tensorflow.keras.regularizers import l2

#Hyper parameter tuning specification consideration
#Setup Training Parameters
EPOCHS=20                     # no of iterations
BATCH_SIZE=100                # we can set it anything..depending on total number of records, this will segregate in chunks
VERBOSE=1                     # we can view the details of model training
OUTPUT_CLASSES=len(label_encoder.classes_) # setting to target variables such as ROOT_CAUSE as output class
N_HIDDEN=128                  # hidden layer size of 128
VALIDATION_SPLIT=0.2          # we set this much percentage to be validation data

#Create a Keras sequential model
model = tf.keras.models.Sequential()

#Add a Dense Layer with ReLu activation (Rectified Linear Unit)
model.add(keras.layers.Dense(N_HIDDEN,
                             input_shape=(7,),
                              name='Dense-Layer-1',
                              activation='relu'))

#Add a second dense layer similar to above configuration layer
model.add(keras.layers.Dense(N_HIDDEN,
                              name='Dense-Layer-2',
                              activation='relu'))

#Add a 3rd layer as softmax layer for categorial prediction
model.add(keras.layers.Dense(OUTPUT_CLASSES,
                             name='Final',
                             activation='softmax'))

#We then compile the model, using Adam optimizer and loss function set as categorical_crossentropy
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

#Build / Fit the model
model.fit(X_train,
          Y_train,
          batch_size=BATCH_SIZE,
          epochs=EPOCHS,
          verbose=VERBOSE,
          validation_split=VALIDATION_SPLIT)

model.summary()

Train on 800 samples, validate on 200 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Model: "sequential_14"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Dense-Layer-1 (Dense)        (None, 128)               1024      
_________________________________________________________________
Dense-Layer-2 (Dense)        (None, 128)               16512     
_________________________________________________________________
Final (Dense)                (None, 3)                 387       
Total params: 17,923
Trainable params: 17,923
Non-trainable params: 0
_________________________________________________________________


## 4. Predicting Root Causes

Now that we have built the model, we will use that pre-trained model to predict for a new incident (both for a single incident and also for multiple incidents in a batch).

In [125]:
#Pass individual flags to Predict the root cause for a new incident
CPU_LOAD = 1
MEMORY_LOAD = 0
DELAY = 0
ERROR_1000 = 1
ERROR_1001 = 1
ERROR_1002 = 0
ERROR_1003 = 1

# Will provide an array to the model's predict_classes function
prediction = model.predict_classes([[CPU_LOAD,MEMORY_LOAD,DELAY,ERROR_1000,ERROR_1001,ERROR_1002,ERROR_1003]])

# Then translate the numeric value into a label using inverse transform function on the encoder
print(label_encoder.inverse_transform(prediction))

['DATABASE_ISSUE']


In [126]:
type(prediction)

numpy.ndarray

In [127]:
# Predicting as a Batch
# This is much more effective
# We create array of arrays
print(label_encoder.inverse_transform(
        model.predict_classes([[1,0,0,0,1,1,0],
                                [0,1,1,1,0,0,0],
                                [1,1,0,1,1,0,1],
                                [0,0,0,0,0,1,0],
                                [1,0,1,0,1,1,1]])))

['DATABASE_ISSUE' 'NETWORK_DELAY' 'MEMORY' 'DATABASE_ISSUE'
 'DATABASE_ISSUE']
