Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Extract a new class for entity frequenty tracking #389

Merged
merged 2 commits into from
Feb 25, 2021

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Feb 22, 2021

Issue #, if available:

Description of changes:
HC detectors use a 1-pass algorithm for estimating heavy hitters in a stream. Our method maintains a time-decayed count for each entity, which allows us to compare the frequencies of entities from different detectors in the stream. To reuse the code in historical detectors, I created a new class PriorityTracker and moved all related logic there. When an entity is hit, the caller can call PriorityTracker.updatePriority to update the entity's priority. The callers can find the most frequently occurring entities in the stream using PriorityTracker.getTopNEntities.

This PR also adds tests for NodeStateManager.

Testing done:

  1. manually tested basic workflow of HC detectors still works.
  2. added new tests for PriorityTracker.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

HC detectors use a 1-pass algorithm for estimating heavy hitters in a stream. Our method maintains a time-decayed count for each entity, which allows us to compare the frequencies of entities from different detectors in the stream. To reuse the code in historical detectors, I created a new class PriorityTracker and moved all related logic there.  When an entity is hit, the caller can call PriorityTracker.updatePriority to update the entity's priority.  The callers can find the most frequently occurring entities in the stream using PriorityTracker.getTopNEntities.

This PR also adds tests for NodeStateManager.

Testing done:
1. manually tested basic workflow of HC detectors still works.
2. added new tests for PriorityTracker.
@codecov
Copy link

codecov bot commented Feb 22, 2021

Codecov Report

Merging #389 (ab741f5) into main (a6ce6fc) will increase coverage by 0.00%.
The diff coverage is 96.47%.

Impacted file tree graph

@@            Coverage Diff            @@
##               main     #389   +/-   ##
=========================================
  Coverage     79.14%   79.15%           
- Complexity     2662     2680   +18     
=========================================
  Files           247      248    +1     
  Lines         11717    11749   +32     
  Branches       1008     1010    +2     
=========================================
+ Hits           9274     9300   +26     
- Misses         1964     1971    +7     
+ Partials        479      478    -1     
Flag Coverage Δ Complexity Δ
plugin 79.14% <96.47%> (+<0.01%) 0.00 <27.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...stroforelasticsearch/ad/caching/PriorityCache.java 83.60% <87.50%> (+0.06%) 60.00 <1.00> (ø)
...distroforelasticsearch/ad/caching/CacheBuffer.java 74.78% <92.30%> (-4.28%) 36.00 <6.00> (-2.00)
...roforelasticsearch/ad/caching/PriorityTracker.java 98.43% <98.43%> (ø) 20.00 <20.00> (?)
...stroforelasticsearch/ad/AnomalyDetectorRunner.java 40.00% <0.00%> (-6.25%) 6.00% <0.00%> (-1.00%)
...pendistroforelasticsearch/ad/cluster/HashRing.java 78.94% <0.00%> (-5.27%) 10.00% <0.00%> (-1.00%)
...port/SearchAnomalyDetectorInfoTransportAction.java 59.09% <0.00%> (-4.55%) 4.00% <0.00%> (ø%)
...troforelasticsearch/ad/task/ADBatchTaskRunner.java 87.82% <0.00%> (-0.87%) 63.00% <0.00%> (-1.00%)
...pendistroforelasticsearch/ad/NodeStateManager.java 77.45% <0.00%> (+1.96%) 32.00% <0.00%> (+1.00%)
...asticsearch/ad/cluster/ADClusterEventListener.java 94.00% <0.00%> (+2.00%) 15.00% <0.00%> (+1.00%)
... and 1 more

* @param n the number of entities to return. Can be less than n if there are not enough entities stored.
* @return top entities in the descending order of priority
*/
public List<String> getTopNEntities(int n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method going to be used elsewhere eventually? So far it's only called by tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, historical detectors are expected to call it.

Copy link
Contributor

@LiuJoyceC LiuJoyceC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a minor question I had about a currently unused new method, the refactor looks good.

@kaituo kaituo added the enhancement New feature or request label Feb 23, 2021
* @param entityModelId Entity model Id
* @param priority priority
*/
protected void addPriority(String entityModelId, float priority) {
Copy link
Contributor

@ylwu-amzn ylwu-amzn Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial priority of one entity should be 0? I see the line 210 is using new PriorityNode(entityModelId, 0f)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to set the key as entityModelId ? Can we just put entity as key?

Copy link
Member Author

@kaituo kaituo Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial priority of one entity should be 0? I see the line 210 is using new PriorityNode(entityModelId, 0f)

Yes

Why need to set the key as entityModelId ? Can we just put entity as key?

You asked me to use this name :) How about entityId?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For realtime HC detector, it stores entity model id (detectorId + "entity" + entity ?). For historical HC detector, plan to use just entity value. As we discussed, as detector will have separate PriorityTracker, so we can just use entity, that can save some memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" as detector will have separate PriorityTracker, so we can just use entity, that can save some memory." Could you explain this sentence? I don't understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, we store "<detector_id>+<entity_value>_entity" in cache, we can change to "<entity_value>" to avoid saving duplicate information of "<detector_id>" and "_entity" as the cache is on detector level.

* detector is enabled. i - L measures the elapsed periods since detector starts.
* 0.125 is the decay constant.
*
* Since g(p−L) is changing and they are the same for all entities of the same detector,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all other places in this java doc are using g(i - L), change g(p-L) to g(i - L) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed.

@kaituo kaituo changed the title Extrac a new class for entity frequenty tracking Extract a new class for entity frequenty tracking Feb 24, 2021
@kaituo kaituo merged commit 0515a35 into opendistro-for-elasticsearch:main Feb 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants