You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One significant part of Orion is the user interaction, where users can annotate signals through MTV.
We can use these annotations to improve future anomaly detections.
A simple proposal of how this workflow could look like:
For each signal in the signalset of the datarun:
run pipeline on signal and find anomalies
get all known events that are related to the signal and have a annotation tag from database
For each known event:
get the aggregated signal (in intermediate outputs) from the datarun where the known event was found
get the shape of the sequence that was marked as anomalous in the known event
compare this shape to aggregated signal of current datarun using a specified method (e.g. DTW) and check if some subsequence is significantly closer than others
if there is a similar sequence, add an event with source 'shape matching' and a corresponding annotation tag that is similar to the tag of the original event
if there is any anomaly that was found in the current datarun, which overlaps with the known event, remove it from the list of found anomalies
add all remaining found anomalies as an event with source 'orion'
@sarahmish came up with a first skeleton of how we could implement that in Orion:
from orion.explorer import OrionExplorer
class OrionFeedback:
""" this class manages the annotated events for a specified signal from
MTV and incorperates them back into Orion.
should this be inhereted from the Orion Explorer?
"""
def execute_feedback(self, datarun, signal_id):
""" this is the main method for feedback.
this method loads the signal specified, fetches its known events,
applies shape matching to the known events, then returns a resolved
list of labels for the signal.
Attributes:
- datarun: the datarun with the annotations
- signal_id: the specific signal in the datarun which we are executing feedback
"""
signal = self.get_signal(signal_id)
known_events = self.get_known_events(datarun, signal_id)
for event in known_events:
matched_events = self.shape_matching(signal, event)
if self.overlap(matched_events, known_events):
# priority: user > orion
# user > shape_matching
# shape_matching > orion
#
# if same priority:
# user ? user
# shape_matching.match_score ? shape_matching.match_score
# keep the higher score
#
# or have overlap favour anomalies over normal labels generally.
pass
return matched_events + known_events
def shape_matching(self, signal, segment, method="dtw"):
""" this methods returns the similarity score between signal and segment
Attribute:
- signal: is the signal data
- segment: is the segment we want to match
- method: is the algorithm used
"""
pass
def overlap(self, shapes):
""" this methods returns shapes after removing overlapped components, by keeping
the higher scored ones
Attribute:
- shapes: a list of tuples, where a tuple is composed of (value, score)
"""
# remove overlap
found.sort(key=lambda x: x["cost"])
no_overlap = []
for first_shape in found:
first_range = range(first_shape["id"], first_shape["id"]+window)
flag = True
for second_shape in no_overlap:
second_range = range(second_shape["id"], second_shape["id"]+window)
xs = set(first_range)
if len(xs.intersection(second_range)) > 0:
flag = False
if flag:
no_overlap.append(first_shape)
# helper functions
def get_signal(self, signal):
""" get signal from mongodb
"""
# exists in OrionExplorer
pass
def get_pipeline(self, pipeline):
""" get pipeline from mongodb
"""
# exists in OrionExplorer
pass
def get_known_events(self, signal):
""" get registered events for a particular signal in a given datarun
from mongodb
"""
pass
Some points that should be discussed:
Should this class be a subclass of the OrionExplorer, since it requires many of the same functionality?
How do we handle cases where multiple users annotated a sequence with different labels?
Should we use raw or aggregated signals for the shape matching?
What methods can be used for the shape matching besides DTW? Based on user annotations, can we use supervised (and maybe online) Machine Learning methods for subsequence classification?
The text was updated successfully, but these errors were encountered:
the feedback class benefits a lot from the functionality of OrionExplorer (almost all the helper functions). However, In the proposed form of the class, we treat feedback as merely a general function that binds together existing methods (except shape matching). Expanding on the same point, since shape matching is also required by MTV, maybe we should let the feedback be a part of the OrionExplorer.
if such a conflict occurs, we can handle the case in three ways:
handle each user separately without merging, this will return a little information for Orion.
use all annotations by users and take the most severe case to be the "true" case.
use all annotations by users whilst having a ranking for users; e.g. if user A has a higher rank than user B, and the annotations are in contradictory form, the feedback will use the annotation made by user A.
this is a valid concern in general, if we implement a shape matching algorithm on the raw data whilst displaying the aggregated version, the user could be confused. Although, applying shape matching on raw data could provide interesting results.
I think a basic shape matching algorithm will suffice in this phase of the feedback; eventually when the model is executed again, it should take the pattern of the newly labelled data into consideration and marking other similar shapes with the same tag. I believe another model in this part will increase the error rate of the consequent model.
One significant part of Orion is the user interaction, where users can annotate signals through MTV.
We can use these annotations to improve future anomaly detections.
A simple proposal of how this workflow could look like:
@sarahmish came up with a first skeleton of how we could implement that in Orion:
Some points that should be discussed:
The text was updated successfully, but these errors were encountered: