How to get exact outliers in Univariate Time Series using OutlierDetector? #139

sthirumoorthi · 2021-10-12T14:17:04Z

Hello, I'm trying to analyze the Outlier Detection framework for my project but it appears like the model returns the outlier range (not the exact index). Below are the details about my dataset.

2019-01-01 | 35
2019-01-02 | 32
2019-01-03 | 30
2019-01-04 | 31
2019-01-05 | 44
2019-01-06 | 29
2019-01-07 | 45
2019-01-08 | 43
2019-01-09 | 500
2019-01-10 | 27
2019-01-11 | 38
.....
I would expect the model to return the outlier as "500" and date as "2019-01-09". But the model returns as below.
ts_outDetection.outliers[0] ->
[Timestamp('2019-01-06 00:00:00'),
Timestamp('2019-01-07 00:00:00'),
Timestamp('2019-01-08 00:00:00'),
Timestamp('2019-01-09 00:00:00'),
Timestamp('2019-01-10 00:00:00'),
Timestamp('2019-01-11 00:00:00'),
Timestamp('2019-01-12 00:00:00')]

Can someone help me to understand the outlier detector concept in Kats or direct me to the reference document(if any) please?
Let me know if you need more details.

MoKazemi9 · 2021-10-12T20:37:57Z

Hi @sthirumoorthi , Thanks for opening the issue, our tutorial for outlier detection (2. Outlier Detection) will be useful, pls let us know if any other questions.
https://github.com/facebookresearch/Kats/blob/main/tutorials/kats_202_detection.ipynb

sthirumoorthi · 2021-10-13T16:26:21Z

Hello, Thanks for the quick response. I was referring to the outlier detection tutorial available in this link. However, the model returns outliers in a specific range instead of returning single outlier value or index.

In my example dataset, i was expecting the model to return "500" and "73" as an outlier or index. But it did return additional values/indexes.

ts_outDetection.outliers[0] ->
[Timestamp('2019-01-06 00:00:00'),
Timestamp('2019-01-07 00:00:00'),
Timestamp('2019-01-08 00:00:00'),
Timestamp('2019-01-09 00:00:00'),
Timestamp('2019-01-10 00:00:00'),
Timestamp('2019-01-11 00:00:00'),
Timestamp('2019-01-12 00:00:00')]

Would it be possible to check the outcome of this model and provide your comments please?

sthirumoorthi · 2021-10-13T18:07:10Z

For further reference, i have included my test dataset (first 20 observations) and highlighted the outliers which was detected by the Kats OutlierDetector model.

MoKazemi9 · 2021-10-13T21:44:23Z

@sthirumoorthi, thanks for sharing your results. did you transform your data to TimeSeriesData before applying the detector?
from kats.consts import TimeSeriesData

sthirumoorthi · 2021-10-14T13:47:41Z

Hi @MoKazemi9, Thanks for checking the details. Yes. I did the transformation before applying the detector.

#transform the data for outlier detection
birth_ts = TimeSeriesData(df_birth)

#Outlier Detection model for the 'Daily Total Female Birth' dataset
ts_outDetection = OutlierDetector(birth_ts, 'additive')
ts_outDetection.detector()

My complete Python file is available in my GitHub repository with the test dataset, for your reference.
https://github.com/sthirumoorthi/TimeSeries-Models/tree/main/FB%20Kats%20with%20Example

sanelemahlalela · 2021-10-15T20:24:25Z

I think @sthirumoorthi is asking why the detector returns the index (time column) , instead of the value (value column) on his data set. If this is the case @sthirumoorthi , I think that you are better off getting back the index. With the index then you can find the value on your copy of the dataset. Incase you have multiple rows with that particular value, in this case 500, then, you'll receive multiple indexes. This is my understanding. I might be wrong

sthirumoorthi · 2021-10-15T20:49:52Z

Hi @sanelemahlalela.. Getting the value or index is not a problem. The model returns range of values/indexes or multiple values/indexes instead of returning the outliers. In the above example, the model returns the below timestamps as outliers wherein i was expecting only one timestamp ('2019-01-09 00:00:00') as outlier (value '500').

ts_outDetection.outliers[0] ->
[Timestamp('2019-01-06 00:00:00'),
Timestamp('2019-01-07 00:00:00'),
Timestamp('2019-01-08 00:00:00'),
Timestamp('2019-01-09 00:00:00'),
Timestamp('2019-01-10 00:00:00'),
Timestamp('2019-01-11 00:00:00'),
Timestamp('2019-01-12 00:00:00')]

As per my understanding, we don't have any hyper-parameters to adjust (except iqr_mul) to increase the accuracy of the model. So i wanted to understand how the outlier detection logic works for this model and what needs to be done to detect the exact outliers.

sanelemahlalela · 2021-10-16T13:44:23Z

I don't get the problem of multiple time stamps in your data.

sthirumoorthi · 2021-10-18T17:52:16Z

@sanelemahlalela.. Thanks for checking the details. The detector returns the results as expected, in your case. Not sure why this is not working for my test data. Might be an issue in my test data? or the way i coded the logic? I double checked the python code and couldn't find any issue.

I understand that you have picked portion of my test data for your test. Can you try the actual data from my repository (csv file) please?
https://github.com/sthirumoorthi/TimeSeries-Models/tree/main/FB%20Kats%20with%20Example
Both (python code & csv file) the files are available in the below link

rohanfb closed this as completed Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get exact outliers in Univariate Time Series using OutlierDetector? #139

How to get exact outliers in Univariate Time Series using OutlierDetector? #139

sthirumoorthi commented Oct 12, 2021

MoKazemi9 commented Oct 12, 2021

sthirumoorthi commented Oct 13, 2021

sthirumoorthi commented Oct 13, 2021

MoKazemi9 commented Oct 13, 2021

sthirumoorthi commented Oct 14, 2021 •

edited

sanelemahlalela commented Oct 15, 2021

sthirumoorthi commented Oct 15, 2021

sanelemahlalela commented Oct 16, 2021

sthirumoorthi commented Oct 18, 2021

How to get exact outliers in Univariate Time Series using OutlierDetector? #139

How to get exact outliers in Univariate Time Series using OutlierDetector? #139

Comments

sthirumoorthi commented Oct 12, 2021

MoKazemi9 commented Oct 12, 2021

sthirumoorthi commented Oct 13, 2021

sthirumoorthi commented Oct 13, 2021

MoKazemi9 commented Oct 13, 2021

sthirumoorthi commented Oct 14, 2021 • edited

sanelemahlalela commented Oct 15, 2021

sthirumoorthi commented Oct 15, 2021

sanelemahlalela commented Oct 16, 2021

sthirumoorthi commented Oct 18, 2021

sthirumoorthi commented Oct 14, 2021 •

edited