Script.txt

Slide:1

Hi Sir, Good Evening.
Myself Rishi Sharma, an intern for a month at Cisco. I am currently doing my BTech 
in Computer Science from NIT Karnataka and currently in 3rd year of it.I will be 
presenting the work which I had done in detecting anomalies from Cisco Webex data.

Slide:2

So first thing comes to our mind is what is an anomaly?

Slide:3

Consider this image from this image our eyes can easily detect what is an anomaly. 
Anomaly is something which does not follow the trend what other things which are 
present in the same domain follows. Here in the image, we can see that except 
one(which is a toy) all other are police member hence the toy is considered as an 
anomaly while rest as normal.

Slide:4

To understand it better, let's consider an example.
So, consider the following 10 data points. we can easily identify that datapoint 
100 and 150 are the ones which differ from the rest of the data points. And the 
reason is if we find the distance of 100 and 150 with the rest of the points it 
will be more as compared to other points. Hence these are classified as anomalies 
while rest of the points are classified as normal.

Slide:5

Consider the following plot, we can easily identify by just seeing that what is an 
anomaly and what is normal. The data points which are following the trend are 
normal while which are not following are considered as abnormal. So it's very easy 
for us to classify between normal and anomalous behaviour.

Slide:6

Our data from Cisco Webex servers include the following six metrics:-
CPU usage, memory consumption, IOPS, write latency, read latency using which we are 
going to detect anomalies.

Slide:7

But the biggest problem is that we don't know when the abnormal condition will 
arise as these situations are rare and why it is important to detect? Because these 
anomalies can arise at any time and will cause problems for our customer(which is 
Cisco's priority to have the best customer experience). So due to this, we are 
using machine learning approaches to automate the whole process and alert us 
whenever such a situation arises.

Slide:8

These are the following steps which are followed in the literature to detect 
anomalies and remove them. First, we will process the data and then process it to 
the form so that we can feed it into the algorithm. We will use statistical metrics 
such as mean, median etc.. to train the algorithm about what is a normal pattern 
and what is abnormal. After training the algorithm we have to decide a threshold 
value, the value above and below the negative of which will be considered as 
anomalous and finally we will detect the anomalies on new data based on the decided 
threshold. After detecting the anomalies with our domain knowledge we will reach to 
the root cause of the problem and hence try to remove it. So these are the following
steps which are followed in the literature to detect and remove anomalies. So, now 
we know what is the difference between normal and anomalous.


Slide:9

So its time to have a look at various algorithms which I have applied to the Cisco 
Webex data to detect anomalies.
I have used 3 algorithms:
1)Modified Z score using Median absolute deviation from the median
2)Isolation forest (an unsupervised learning method)
3)Luminol(an open-source library by LinkedIn)

Slide:10

The first algorithm which I have applied is detecting anomalies using the Z score

Slide:11

The basic idea of the algorithm is to find what is normal first then detecting 
anomalies based on z score which tell us how far a data point from the median of 
the deviations

Slide:12

Let's have a look at the following example for better understanding:
First, we have to calculate the median of the datapoints
Then we will calculate the deviation of the data points from the median calculated. 
Now we will find the median of the absolute values of the deviation. The value 
which we have found is called MAD(median absolute deviation from the median) 
this MAD is what we called as normal. This is the representative of the normal 
of our dataset which will be compared with the rest of the data points to classify 
them as normal or anomalous. More about the algorithm can be found here.

Slide:13
Now we have calculated what is the normal of our dataset we will calculate the z 
score which can be calculated as follows:
In the formula, Mi is z score xi is the datapoint and x~ is median and MAD is a 
median absolute deviation from the median. Z score will tell us how much away from a
data point from the normal and having the value of threshold we can classify a 
data point as normal/anomalous.

Slide:14

These are the Z score calculated for our earlier datapoint and provided threshold is 
3. After comparing with the threshold our algortihm easily detect that 96 is an 
anomaly while rest of the points are normal.

Slide:15

These are the results which I got after applying the algorithm to the Webex data.
We can see that all the six metrics are having anomalies at nearly the same data
points. So we can say that the algorithm works pretty well.
(...)

Slide:16

The second algorithm which I have used for the detection is Isolation forest. It is 
a very complex algorithm. It works on the decision trees and is an unsupervised 
learning algorithm i.e, it does not require any label to tell it what is an anomaly
and what is normal.

Slide:17

The intuition is:
As we know that anomaly percentage is very less in the dataset as compared to the 
normal data points. The algorithm takes advantage of this property. It starts 
isolating the anomalies present in the dataset and since there are more points
around normal points it will take time to isolate them and since the anomalous 
point is away from the normal point and will be isolated quickly. The algorithm 
also gives an anomalous score to every datapoint which will decide how severe is
the anomaly.You can check more details about the algorithm here.

Slide:18

These are the results which I got after applying the algorithm.
And they are pretty similar to those by applying the MAD algorithm.
(....).

Slide:19

The third algorithm which I had used for detecting anomalies is an open-source 
library by LinkedIn names Luminol.

Slide:20

So it also works pretty similar to an earlier algorithm, it will take dictionary as 
an input with keys as timestamps and value as the metric. And result out the time
window where an anomaly is found and also gives back anomaly score for knowing 
the severity of anomaly. More about the library can be known by clicking the link.

Slide:21

Here are the results after applying the algorithm. 
(....)

Slide:22

Here is the implementation of all the algorithm and data preprocessing which I had
done for detecting anomalies.