This notebook demonstrates how the anomaly detection in errors work.

In [None]:
#r "nuget:YSoft.Rqa.AnomalyDetection.Application"

In [None]:
using YSoft.Rqa.AnomalyDetection.Application.Model;
using YSoft.Rqa.AnomalyDetection.Application.Services;
using YSoft.Rqa.AnomalyDetection.Data.Model.Csv;

In [None]:
var detector = new ErrorAnomalyDetector();
var plotter = new Plotter();

Firstly, let's generate a reference dataset, to which the new data will be compared.  
The numbers don't really matter, let's just use for example 74% requests without an error, 22% with one error and 4% with 2 errors.

In [None]:
var refCount = 5000;
var refErrors = Enumerable.Repeat(0, (int)(refCount * 0.74))
    .Concat(Enumerable.Repeat(1, (int)(refCount * 0.22)))
    .Concat(Enumerable.Repeat(2, (int)(refCount * 0.4)))
    .ToList();
var refRequests = Enumerable.Range(0, refCount).Select(i => new RequestDataPoint { Errors = refErrors[i] });
var refDf = new RequestGroup("MockService", "MockRequestType", refRequests.ToList()).ValidData;

Now let's demonstrate 3 possible situations.  
1) A new error count pops up ... only 0, 1 or 2 errors are allowed but e.g. 3 (or whatever) errors occur.
2) Certain error count occurs more frequently ... e.g. 1 error starts occuring in 50% of cases while it should occur in only around 22% of cases.
3) The data are alright.

Following dataset breaks rule 1) ... a new error count is added.

In [None]:
var count = 1000;
var errors1 = Enumerable.Repeat(0, (int)(count * 0.73))
    .Concat(Enumerable.Repeat(1, (int)(count * 0.21)))
    .Concat(Enumerable.Repeat(2, (int)(count * 0.03)))
    .Concat(Enumerable.Repeat(3, (int)(count * 0.03)))
    .ToList();
var requests1 = Enumerable.Range(0, count).Select(i => new RequestDataPoint { Errors = errors1[i] });
var df1 = new RequestGroup("MockService", "MockRequestType", requests1.ToList()).ValidData;

In [None]:
display(plotter.ErrorBar(df1, refDf, "New error count class emerged in input dataset"))

Check the result.

In [None]:
detector.IsInputAnomalous(df1, refDf)

On the other hand, if an error count class is missing (there haven't been these errors in the input dataset), it's ok as long as the limits hold.

In [None]:
var errors2 = Enumerable.Repeat(0, (int)(count * 0.76)).Concat(Enumerable.Repeat(1, (int)(count * 0.24))).ToList();
var requests2 = Enumerable.Range(0, count).Select(i => new RequestDataPoint { Errors = errors2[i] });
var df2 = new RequestGroup("MockService", "MockRequestType", requests2.ToList()).ValidData;

In [None]:
display(plotter.ErrorBar(df2, refDf, "An error count class didn't occur"))

In [None]:
detector.IsInputAnomalous(df2, refDf)

Following dataset breaks rule 2) ... error counts vary more than it should.  
The IsInputAnomalous method has an additional parameter specifying the tolerance. Default is 10%.

In [None]:
var errors3 = Enumerable.Repeat(0, (int)(count * 0.62))
    .Concat(Enumerable.Repeat(1, (int)(count * 0.35)))
    .Concat(Enumerable.Repeat(2, (int)(count * 0.03)))
    .ToList();
var requests3 = Enumerable.Range(0, count).Select(i => new RequestDataPoint { Errors = errors3[i] });
var df3 = new RequestGroup("MockService", "MockRequestType", requests3.ToList()).ValidData;

In [None]:
display(plotter.ErrorBar(df3, refDf, "Error rates differ more than they should"))

Reference dataset has 74% requests without an error, here the input has only 62% -> 12% difference ... thats more than 10% difference -> anomaly.

In [None]:
detector.IsInputAnomalous(df3, refDf)

To check the ad 3), let's be more benevolent and increase the tolerance to 15% -> it shouldn't be an anomaly anymore.

In [None]:
detector.IsInputAnomalous(df3, refDf, 0.15)