Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

add AD task and tune detector&AD result data model #329

Merged
merged 7 commits into from
Dec 16, 2020

Conversation

ylwu-amzn
Copy link
Contributor

@ylwu-amzn ylwu-amzn commented Dec 11, 2020

Issue #, if available:

Description of changes:

Currently we only have realtime detector which can take streaming data and detect anomaly in realtime way. For historical detector, user can feed historical data and detect historical anomalies. When user start historical detector, we will start an AD task at backend and run it asynchronously. This PR is to add AD task and tune detector and AD result to support historical detector.

  1. AD Task
    When user start a historical detector, we will create a new AD task. We will record task state, progress and detector snapshot in task.

  2. Detector changes

    • Add detection date range: user can specify the historical date range to run anomaly detection
    • Add detector type: this is to make the search and aggregation easier. When create detector, user doesn't need to fill this field, we will check detection_date_range and catetory_field to decide detector type automatically.
  3. AD result changes

    • Add task id: user can run historical detector multiple times, and for each run we will create a new AD task. We store task id in AD result, so user can query the latest AD task result with task id.

Test

1../gradlew build
2../gradlew integTest -PnumNodes=3

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link

codecov bot commented Dec 11, 2020

Codecov Report

Merging #329 (2cd58aa) into master (f442efd) will increase coverage by 0.51%.
The diff coverage is 97.26%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #329      +/-   ##
============================================
+ Coverage     74.39%   74.90%   +0.51%     
- Complexity     2022     2133     +111     
============================================
  Files           201      206       +5     
  Lines          9563     9999     +436     
  Branches        852      892      +40     
============================================
+ Hits           7114     7490     +376     
- Misses         2019     2088      +69     
+ Partials        430      421       -9     
Flag Coverage Δ Complexity Δ
cli 79.27% <ø> (ø) 0.00 <ø> (ø)
plugin 74.56% <97.26%> (+0.57%) 0.00 <80.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...stroforelasticsearch/ad/model/AnomalyDetector.java 94.24% <84.61%> (+6.26%) 84.00 <7.00> (+18.00)
...oforelasticsearch/ad/model/DetectionDateRange.java 89.18% <89.18%> (ø) 11.00 <11.00> (?)
...on/opendistroforelasticsearch/ad/model/ADTask.java 100.00% <100.00%> (ø) 57.00 <57.00> (?)
...endistroforelasticsearch/ad/model/ADTaskState.java 100.00% <100.00%> (ø) 1.00 <1.00> (?)
...pendistroforelasticsearch/ad/model/ADTaskType.java 100.00% <100.00%> (ø) 1.00 <1.00> (?)
...forelasticsearch/ad/model/AnomalyDetectorType.java 100.00% <100.00%> (ø) 1.00 <1.00> (?)
...distroforelasticsearch/ad/model/AnomalyResult.java 99.40% <100.00%> (+10.66%) 58.00 <2.00> (+20.00)
...ransport/DeleteAnomalyDetectorTransportAction.java 21.68% <0.00%> (-34.73%) 3.00% <0.00%> (-11.00%)
...opendistroforelasticsearch/ad/util/ParseUtils.java 45.66% <0.00%> (-11.92%) 31.00% <0.00%> (+3.00%) ⬇️
...port/SearchAnomalyDetectorInfoTransportAction.java 59.09% <0.00%> (-4.55%) 4.00% <0.00%> (ø%)
... and 13 more

Copy link
Contributor

@yizheliu-amazon yizheliu-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major concerns from me.

Copy link
Contributor

@weicongs-amazon weicongs-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double confirm. for the schema change, we will have a test plan for it, right? We always need be cautious for any schema related change

@ylwu-amzn
Copy link
Contributor Author

ylwu-amzn commented Dec 15, 2020

just double confirm. for the schema change, we will have a test plan for it, right? We always need be cautious for any schema related change

Yes, agree. Have done backward compatibility test based on previous test plans to make sure this change not break current functions. And I have added more test cases for detector and results, improved the branch coverage to >90% and fixed some bugs in old code.

}

private static boolean isRealTimeDetector(DetectionDateRange detectionDateRange) {
return detectionDateRange == null || detectionDateRange.getEndTime() == null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: endTime can't be null, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for initial release, we plan to separate historical detector and realtime detector flow, so it's true the endTime can't be null. Next phase we will build universal flow, then the endTime could be null. If the endTime is null, we will pre-train model with historical data from startTime to now, then start realtime detection with the pre-trained model.

@ylwu-amzn ylwu-amzn merged commit 9c3b972 into opendistro-for-elasticsearch:master Dec 16, 2020
@ohltyler ohltyler added the enhancement New feature or request label Feb 2, 2021
@ylwu-amzn ylwu-amzn added feature new feature and removed enhancement New feature or request labels Feb 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants