/
Script.txt
161 lines (118 loc) · 6.77 KB
/
Script.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
Slide:1
Hi Sir, Good Evening.
Myself Rishi Sharma, an intern for a month at Cisco. I am currently doing my BTech
in Computer Science from NIT Karnataka and currently in 3rd year of it.I will be
presenting the work which I had done in detecting anomalies from Cisco Webex data.
Slide:2
So first thing comes to our mind is what is an anomaly?
Slide:3
Consider this image from this image our eyes can easily detect what is an anomaly.
Anomaly is something which does not follow the trend what other things which are
present in the same domain follows. Here in the image, we can see that except
one(which is a toy) all other are police member hence the toy is considered as an
anomaly while rest as normal.
Slide:4
To understand it better, let's consider an example.
So, consider the following 10 data points. we can easily identify that datapoint
100 and 150 are the ones which differ from the rest of the data points. And the
reason is if we find the distance of 100 and 150 with the rest of the points it
will be more as compared to other points. Hence these are classified as anomalies
while rest of the points are classified as normal.
Slide:5
Consider the following plot, we can easily identify by just seeing that what is an
anomaly and what is normal. The data points which are following the trend are
normal while which are not following are considered as abnormal. So it's very easy
for us to classify between normal and anomalous behaviour.
Slide:6
Our data from Cisco Webex servers include the following six metrics:-
CPU usage, memory consumption, IOPS, write latency, read latency using which we are
going to detect anomalies.
Slide:7
But the biggest problem is that we don't know when the abnormal condition will
arise as these situations are rare and why it is important to detect? Because these
anomalies can arise at any time and will cause problems for our customer(which is
Cisco's priority to have the best customer experience). So due to this, we are
using machine learning approaches to automate the whole process and alert us
whenever such a situation arises.
Slide:8
These are the following steps which are followed in the literature to detect
anomalies and remove them. First, we will process the data and then process it to
the form so that we can feed it into the algorithm. We will use statistical metrics
such as mean, median etc.. to train the algorithm about what is a normal pattern
and what is abnormal. After training the algorithm we have to decide a threshold
value, the value above and below the negative of which will be considered as
anomalous and finally we will detect the anomalies on new data based on the decided
threshold. After detecting the anomalies with our domain knowledge we will reach to
the root cause of the problem and hence try to remove it. So these are the following
steps which are followed in the literature to detect and remove anomalies. So, now
we know what is the difference between normal and anomalous.
Slide:9
So its time to have a look at various algorithms which I have applied to the Cisco
Webex data to detect anomalies.
I have used 3 algorithms:
1)Modified Z score using Median absolute deviation from the median
2)Isolation forest (an unsupervised learning method)
3)Luminol(an open-source library by LinkedIn)
Slide:10
The first algorithm which I have applied is detecting anomalies using the Z score
Slide:11
The basic idea of the algorithm is to find what is normal first then detecting
anomalies based on z score which tell us how far a data point from the median of
the deviations
Slide:12
Let's have a look at the following example for better understanding:
First, we have to calculate the median of the datapoints
Then we will calculate the deviation of the data points from the median calculated.
Now we will find the median of the absolute values of the deviation. The value
which we have found is called MAD(median absolute deviation from the median)
this MAD is what we called as normal. This is the representative of the normal
of our dataset which will be compared with the rest of the data points to classify
them as normal or anomalous. More about the algorithm can be found here.
Slide:13
Now we have calculated what is the normal of our dataset we will calculate the z
score which can be calculated as follows:
In the formula, Mi is z score xi is the datapoint and x~ is median and MAD is a
median absolute deviation from the median. Z score will tell us how much away from a
data point from the normal and having the value of threshold we can classify a
data point as normal/anomalous.
Slide:14
These are the Z score calculated for our earlier datapoint and provided threshold is
3. After comparing with the threshold our algortihm easily detect that 96 is an
anomaly while rest of the points are normal.
Slide:15
These are the results which I got after applying the algorithm to the Webex data.
We can see that all the six metrics are having anomalies at nearly the same data
points. So we can say that the algorithm works pretty well.
(...)
Slide:16
The second algorithm which I have used for the detection is Isolation forest. It is
a very complex algorithm. It works on the decision trees and is an unsupervised
learning algorithm i.e, it does not require any label to tell it what is an anomaly
and what is normal.
Slide:17
The intuition is:
As we know that anomaly percentage is very less in the dataset as compared to the
normal data points. The algorithm takes advantage of this property. It starts
isolating the anomalies present in the dataset and since there are more points
around normal points it will take time to isolate them and since the anomalous
point is away from the normal point and will be isolated quickly. The algorithm
also gives an anomalous score to every datapoint which will decide how severe is
the anomaly.You can check more details about the algorithm here.
Slide:18
These are the results which I got after applying the algorithm.
And they are pretty similar to those by applying the MAD algorithm.
(....).
Slide:19
The third algorithm which I had used for detecting anomalies is an open-source
library by LinkedIn names Luminol.
Slide:20
So it also works pretty similar to an earlier algorithm, it will take dictionary as
an input with keys as timestamps and value as the metric. And result out the time
window where an anomaly is found and also gives back anomaly score for knowing
the severity of anomaly. More about the library can be known by clicking the link.
Slide:21
Here are the results after applying the algorithm.
(....)
Slide:22
Here is the implementation of all the algorithm and data preprocessing which I had
done for detecting anomalies.