Queries data from the index when insufficient data in buffer to form a full shingle #176

LiuJoyceC · 2020-06-25T00:05:44Z

Issue #, if available:

Description of changes:
When running an AD on an interval, data points from past consecutive intervals are required in the shingling process. Currently, the AD only checks data points from a buffer which only contains data points for intervals which were previously run. This means for the first several intervals which the AD runs on, or if there were previously missed intervals (for example, due to restarts), the AD is unable to form a shingle and will output no results, even if the data points for the missed intervals are in the data index.

With this change, instead of issuing a query which only retrieves the current data point, a single query to the index will be issued for the current data point and any missing data points from the buffer that are needed to form a full shingle. The results of the query are cached so that if the data point is missing from the index itself, it will not be re-queried later.

Testing:
A comprehensive set of unit tests have been added (8 new tests covering 25 test cases) to test the FeatureManager getCurrentFeatures method under various initial states and query responses.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

wnbts

it seems the unit test is lost from repo migration and further clean up. given the importance of this method, all branches should be covered by unit test.

https://code.amazon.com/reviews/CR-15233815/revisions/2#/diff

9af0080#diff-d5d992d03a1e15cffb9cfc0b32239c89

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

…a full shingle.

wnbts · 2020-06-30T20:17:28Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

+     * @param listener onResponse is called with unprocessed features and processed features for the current data point
+     * @throws IOException if a user gives wrong query input when defining a detector
+     */
+    public void getCurrentFeatures(


this method should be private to the class.

The reason this is made public is to allow the unit tests to set the initial state of the shingle without caching missing data points during the setup (since the caching would affect the desired initial state of the shingle to be tested).

I can try to find alternative ways to set the initial state of the shingle, in which case I'll just remove the cacheMissingDataPoints parameter (and just always cache missing data points) and remove the method above.

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

wnbts · 2020-06-30T20:32:46Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

+     * @param endTime end time of the data point in epoch milliseconds
+     * @param cacheMissingDataPoints if set to true, missing data points are remembered and not re-queried later
+     * @param listener onResponse is called with unprocessed features and processed features for the current data point
+     * @throws IOException if a user gives wrong query input when defining a detector


minor. the query is from both the input and the system. if that happens, there will be no recovery for caller. usually those exceptions are caught and wrapped (abstracted) in unchecked exceptions for high-level callers, like in many other programming languages.

Ok, I'll add a try catch block around searchFeatureDao.getFeatureSamplesForPeriods() and pass the error to listener.onFailure() instead, since that's where other failures from searchFeatureDao.getFeatureSamplesForPeriods() are being passed to.

Please make sure AnomalyResultTransportAction.onFeatureResponse can handle the exception you throw. The IOException sounds like sth you should rethrow EndRunException in AnomalyResultTransportAction and let user know.

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

kaituo · 2020-07-06T17:22:47Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

+     * @param endTime end time of the data point in epoch milliseconds
+     * @param cacheMissingDataPoints if set to true, missing data points are remembered and not re-queried later
+     * @param listener onResponse is called with unprocessed features and processed features for the current data point
+     * @throws IOException if a user gives wrong query input when defining a detector


Please make sure AnomalyResultTransportAction.onFeatureResponse can handle the exception you throw. The IOException sounds like sth you should rethrow EndRunException in AnomalyResultTransportAction and let user know.

1) Allows time jitter up to half an interval. 2) getCurrentFeatures returns all errors to the listener instead of throwing when searchFeatureDao throws. 3) Modified unit tests to not require a public getCurrentFeatures interface that allows for not caching.

…rch/anomaly-detection into fill-shingle-with-history

wnbts · 2020-07-06T23:29:00Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

-    private final Map<String, ArrayDeque<Entry<Long, double[]>>> detectorIdsToTimeShingles;
+    private Map<String, ArrayDeque<Entry<Long, Optional<double[]>>>> detectorIdsToTimeShingles;


minor. this data should still be final.

Good catch. I forgot to add back the final when I was originally experimenting with mocking private variables for unit tests.

wnbts · 2020-07-07T00:02:08Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

-                    .map(points -> new SinglePointFeatures(currentPoint, Optional.of(batchShingle(points, shingleSize)[0])))
-                    .orElse(new SinglePointFeatures(currentPoint, Optional.empty()))
-            );
+        if (shingle.isEmpty() || shingle.getLast().getKey() < endTime || !shingle.getLast().getValue().isPresent()) {


minor. the second condition shingle.getLast().getKey() < endTime might need some relaxation. if the existing buffer is [1min, 2min,..., 8min] and endTime is 8min1sec, the last data point would still be valid for the endTime.

also, this if branch may be merged with the else branch, starting with optional current point and if current point is present, then optional current shingle.

…rch/anomaly-detection into fill-shingle-with-history

wnbts · 2020-07-09T23:43:00Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

+            } catch (IOException e) {
+                listener.onFailure(e);


minor. error handling can change to as kaituo suggested.

wnbts · 2020-07-09T23:47:46Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

+        if (featuresMap.containsKey(endTime)) {
            getProcessedFeatures(shingle, detector, endTime, listener);
        } else {
            listener.onResponse(new SinglePointFeatures(Optional.empty(), Optional.empty()));


getProcessedFeatures should be able to handle both branches.

Are you suggesting to pass featuresMap as a parameter into getProcessedFeatures and check this condition in there?

The condition featuresMap.containsKey(endTime) is the simplest/most concise way to ensure the conditions !shingle.isEmpty() and shingle.getLast().getKey() ≈ endTime (where ≈ means within half an interval away as discussed, which basically means that the current point is present). The reason this check is being done in updateUnprocessedFeatures is because this check is only needed if updateUnprocessedFeatures is invoked, and not needed if getProcessedFeatures is directly invoked by getCurrentFeatures (in which case the missingRanges.size() > 0 already ensures the above conditions, so checking featuresMap is unnecessary).

no, featuremap is not needed. by the end of line 176, the collection shingle should contain exactly shingle sized data points, present or absent, the last being the most ending around endTime. getProcessedFeatures can output the same results for both if and else branches.

With the current implementation (where results are not added to the shingle if searchFeatureDao returns Collections.emptyList()), the shingle is not guaranteed to contain an entry around endTime, which is why this check is currently being done.

However, per your response on the comment below, if we go ahead and cache all missing ranges when searchFeatureDao returns Collections.emptyList(), then the shingle will be guaranteed to have the entry at or near endTime, and then checking the featureMap will no longer be needed.

wnbts · 2020-07-09T23:53:34Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

+        shingle.clear();
+        getFullShingleEndTimes(endTime, getDetectorIntervalInMilliseconds(detector))
+            .filter(time -> featuresMap.containsKey(time))
+            .mapToObj(time -> featuresMap.get(time))


minor. this first filter might not be needed since the map should contains results for all interval, present or absent. to be safe in the unlikely case that map is incomplete, the second get can return an empty if the key is absent using getOrDefault.

It depends on how we want to handle this case: https://github.com/opendistro-for-elasticsearch/anomaly-detection/blob/master/src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/SearchFeatureDao.java#L211-L214. This is the case that could cause some of the times to be missing from featuresMap at this line.

Is aggs == null in the search response equivalent to each queried time range having no data? In other words, do we want to treat Collections.emptyList() equivalent to [Optional.empty(), Optional.empty(), ..., Optional.empty()]? In the case of Collections.emptyList(), do we want to cache the null values for each of the queried time ranges so they will not get re-queried?

If yes, it might be cleaner to handle this logic in the callback passed to searchFeatureDao, so that the logic for determining shingle values based on the search response is located in one function rather than spread across multiple functions.

i haven't seen or know that happens in practice. if that ever happens, caching the results is ok. what i was suggesting is just simplify the implementation to one line like stream.maptoobject(time -> map.getOrDefault(time, <time, empty()>)).

Alright, I will change the implementation to cache results for the missing ranges when the response is an empty list.

kaituo · 2020-07-10T16:56:58Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

-        }
+        Optional<double[]> currentPoint = shingle.peekLast().getValue();
+        listener
+            .onResponse(


The original code is easier to read. Also, you do "currentPoint.map(point -> filterAndFill(shingle, endTime, detector))" without using currentPoint in filterAndFill, which looks strange.

This change was made to address the comments above from wnbts to simplify the code here to not use if else to ensure currentPoint is not an empty Optional. The way this is written is such that if currentPoint is an empty Optional, it triggers the orElse line, so that if the currentPoint is empty, we don't end up with a non-empty array of points in the SinglePointFeatures (if that were to happen, it would be an unexpected change in behavior from before).

The reason the original code didn't need to check that was because currentPoint wasn't an Optional before. Now that we are caching empty responses from the search request to ensure we don't re-query missing data, the points in the shingle are wrapped in Optionals.

Another unexpected change in behavior that this is avoiding is running filterAndFill on a shingle where the current point is not present. This would result in the shingle filling in the missing current point with older data. In the original code, there were checks in getCurrentFeatures and updateUnprocessedFeature to ensure the current point is not missing. These checks were removed to address comments above about simplifying the logic.

The code was modified to no longer have an unused param point in a function, and now currentPoint.isPresent() is explicitly checked to make the intention clear. Does this address the concern?

yes, thanks for the change.

…arch response.

…rch/anomaly-detection into fill-shingle-with-history

…" in Optional map function.

kaituo · 2020-07-13T17:43:36Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

+                    updateUnprocessedFeatures(detector, shingle, featuresMap, endTime, listener);
+                }, listener::onFailure));
+            } catch (IOException e) {
+                listener.onFailure(new EndRunException(detector.getDetectorId(), CommonErrorMessages.INVALID_SEARCH_QUERY_MSG, e, true));


add a function-level comment (e.g.,

anomaly-detection/src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

Line 243 in cd3bdae

* onFailure is called with EndRunException on feature query creation errors

) to say you are gonna throw this exception when certain condition is true?

kaituo · 2020-07-13T18:07:57Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

            if (result.length < shingleSize) {
                result = null;
            }
        }
        return result;
    }

+    private Stream<Entry<Long, Entry<Long, Optional<double[]>>>> getNearbyPointsForShingle(


Have some documentation of this method since the return value is nested and not easy to understand?

kaituo · 2020-07-13T18:40:12Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

@@ -132,81 +133,129 @@ public FeatureManager(
     */
    public void getCurrentFeatures(AnomalyDetector detector, long startTime, long endTime, ActionListener<SinglePointFeatures> listener) {


Add comments on your workflow? Is the following understanding correct?

First, you interpolate using nearest points.
Then, query for missing points.
Finally, do another round of interpolation using nearest points.

The code isn't actually interpolating before querying for missing points. It is meant to account for a small amount of jitter in time (see wnbts's comment above) so that points that are not actually missing aren't re-queried simply because the millisecond timestamp doesn't exactly match up. The fact that we have to account for random time jitter does make the implementation a bit more complicated than the original code (and my original first version of this feature).

For example, let's say the interval is 60,000 milliseconds, and the detector ends up running at time 60,000, time 120,001, and time 179,999 (simply due to random time jitter). At time 120,001, the function shouldn't re-query historical data just because there doesn't exist a data point stored in the shingle buffer whose timestamp matches up to exactly 60,001. It should just recognize that the point at timestamp 60,000 is the point from 1 interval ago. Similarly, at time 179,999, it shouldn't re-query the previous 2 intervals just because there aren't points whose timestamps are exactly 119,999 and 59,999.

Also, what actually gets stored in the shingle buffer (which persists to intervals in the future) is the actual original timestamp of the data point, so no interpolation is being done. In the example above, what gets stored in the shingle is:
[<60000, x>, <120001, y>, <179999, z>]. So the data point at time 60,000 is not getting interpolated to time 60,001. Instead, time 60,000 is just recognized as the timestamp of the previous interval.

The actual interpolation is done after the query for missing points. Here, missing points will be interpolated with points from neighboring intervals up to the configured maxNeighborDistance (which is currently configured to 2 intervals away). So if the shingle is missing an interval, such as [<60000, x>, <179999, z>], then the data point from time 179,999 will get interpolated to time 119,999 as if that were the actual data point from the previous interval, so the resulting double[] from filterAndFill becomes [x, z, z].

First of all, there is no jitter. So the logic is unnecessary. (FYI @wnbts)

You called getNearbyPointsForShingle before running query that actually use nearby points for interpolation, right? getNearbyPointsForShingle returns a map whose key is 60,000, 120,000, 180,000 in your example because the key's source is what getFullShingleEndTimes returns, right?

@kaituo i am not following this entire thread, just comment on the statement that "there is no jiftter". It looks like the time range comes from system clock

anomaly-detection/src/main/java/com/amazon/opendistroforelasticsearch/ad/AnomalyDetectorJobRunner.java

Line 132 in 129d249

Instant executionStartTime = Instant.now();

, which has jitters. Also, scheduled job runs are not exactly evenly spaced for jitters are intentionally added to scheduler.

@wnbts scheduled job's jitter is disabled for AD. It's possible that a scheduled job may not happen exactly as it is scheduled (e.g., system is under heavy load). But in normal cases, the job should run on time.

@LiuJoyceC You can keep your code. That should give some fault tolerance.

You called getNearbyPointsForShingle before running query that actually use nearby points for interpolation, right? getNearbyPointsForShingle returns a map whose key is 60,000, 120,000, 180,000 in your example because the key's source is what getFullShingleEndTimes returns, right?

The key in the map is meant to allow the shingle to match up data points to intervals. The actual timestamp of the data point is stored in the value of the map along with the data point value. In updateUnprocessedFeatures, it is the actual timestamp of the data point that is added to the shingle and persisted to future intervals. So the shingle in the example will contain the timestamps 60,000, 120,001, 179,999. The data points are only associated with their actual timestamps and not imputed to other timestamps at that stage.

You can keep your code. That should give some fault tolerance.

👍

kaituo · 2020-07-13T18:46:39Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java

-        }
+        Optional<double[]> currentPoint = shingle.peekLast().getValue();
+        listener
+            .onResponse(


yes, thanks for the change.

…rch/anomaly-detection into fill-shingle-with-history

…a full shingle (#176) Adds feature to query data from the index when insufficient data in buffer to form a full shingle. Query results are cached.

wnbts reviewed Jun 25, 2020

View reviewed changes

src/main/java/com/amazon/opendistroforelasticsearch/ad/feature/FeatureManager.java Show resolved Hide resolved

yizheliu-amazon added the enhancement New feature or request label Jun 25, 2020

LiuJoyceC force-pushed the fill-shingle-with-history branch from e8b0d2a to 84d4a20 Compare June 30, 2020 19:14

Queries data from the index when insufficient data in buffer to form …

f87ee3d

…a full shingle.

LiuJoyceC force-pushed the fill-shingle-with-history branch from 84d4a20 to f87ee3d Compare June 30, 2020 19:18

wnbts reviewed Jun 30, 2020

View reviewed changes

kaituo reviewed Jul 6, 2020

View reviewed changes

LiuJoyceC added 2 commits July 6, 2020 11:49

Merge branch 'master' of https://github.com/opendistro-for-elasticsea…

2a071b6

…rch/anomaly-detection into fill-shingle-with-history

wnbts reviewed Jul 7, 2020

View reviewed changes

LiuJoyceC added 2 commits July 9, 2020 15:41

Addresses minor comments from PR.

d052f05

Merge branch 'master' of https://github.com/opendistro-for-elasticsea…

cb13815

…rch/anomaly-detection into fill-shingle-with-history

wnbts approved these changes Jul 10, 2020

View reviewed changes

kaituo reviewed Jul 10, 2020

View reviewed changes

LiuJoyceC added 3 commits July 10, 2020 16:57

Wraps IOException in EndRunException and caches an empty list from se…

1680fcc

…arch response.

Merge branch 'master' of https://github.com/opendistro-for-elasticsea…

8e1b4fa

…rch/anomaly-detection into fill-shingle-with-history

Wraps IOException in EndRunException and removes unused use of "point…

843ac79

…" in Optional map function.

kaituo reviewed Jul 13, 2020

View reviewed changes

LiuJoyceC added 2 commits July 14, 2020 18:00

Adds comments/function documents for clarification.

cf9efeb

Merge branch 'master' of https://github.com/opendistro-for-elasticsea…

636f293

…rch/anomaly-detection into fill-shingle-with-history

LiuJoyceC force-pushed the fill-shingle-with-history branch from 484df68 to 636f293 Compare July 15, 2020 01:15

Updates line in documentation causing build to fail.

937a742

kaituo approved these changes Jul 15, 2020

View reviewed changes

LiuJoyceC merged commit 44a8bcd into opendistro-for-elasticsearch:master Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries data from the index when insufficient data in buffer to form a full shingle #176

Queries data from the index when insufficient data in buffer to form a full shingle #176

LiuJoyceC commented Jun 25, 2020 •

edited

wnbts left a comment

wnbts Jun 30, 2020

LiuJoyceC Jun 30, 2020

wnbts Jun 30, 2020

LiuJoyceC Jun 30, 2020

kaituo Jul 6, 2020

kaituo Jul 6, 2020

wnbts Jul 6, 2020

LiuJoyceC Jul 9, 2020

wnbts Jul 7, 2020

wnbts Jul 9, 2020

wnbts Jul 9, 2020

LiuJoyceC Jul 10, 2020

wnbts Jul 10, 2020

LiuJoyceC Jul 10, 2020

wnbts Jul 9, 2020

LiuJoyceC Jul 10, 2020 •

edited

wnbts Jul 10, 2020

LiuJoyceC Jul 10, 2020

kaituo Jul 10, 2020

LiuJoyceC Jul 10, 2020

LiuJoyceC Jul 10, 2020

LiuJoyceC Jul 13, 2020

kaituo Jul 13, 2020

kaituo Jul 13, 2020

LiuJoyceC Jul 13, 2020

kaituo Jul 13, 2020

kaituo Jul 13, 2020

LiuJoyceC Jul 13, 2020

kaituo Jul 13, 2020

wnbts Jul 13, 2020

kaituo Jul 13, 2020

LiuJoyceC Jul 14, 2020

kaituo Jul 13, 2020

		private final Map<String, ArrayDeque<Entry<Long, double[]>>> detectorIdsToTimeShingles;
		private Map<String, ArrayDeque<Entry<Long, Optional<double[]>>>> detectorIdsToTimeShingles;

		@@ -132,81 +133,129 @@ public FeatureManager(
		*/
		public void getCurrentFeatures(AnomalyDetector detector, long startTime, long endTime, ActionListener<SinglePointFeatures> listener) {

Queries data from the index when insufficient data in buffer to form a full shingle #176

Queries data from the index when insufficient data in buffer to form a full shingle #176

Conversation

LiuJoyceC commented Jun 25, 2020 • edited

wnbts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiuJoyceC Jul 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiuJoyceC commented Jun 25, 2020 •

edited

LiuJoyceC Jul 10, 2020 •

edited