Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate user feedback into Orion #72

Open
AlexanderGeiger opened this issue Dec 17, 2019 · 1 comment
Open

Integrate user feedback into Orion #72

AlexanderGeiger opened this issue Dec 17, 2019 · 1 comment

Comments

@AlexanderGeiger
Copy link
Contributor

One significant part of Orion is the user interaction, where users can annotate signals through MTV.
We can use these annotations to improve future anomaly detections.

A simple proposal of how this workflow could look like:

  • For each signal in the signalset of the datarun:
    • run pipeline on signal and find anomalies
    • get all known events that are related to the signal and have a annotation tag from database
    • For each known event:
      • get the aggregated signal (in intermediate outputs) from the datarun where the known event was found
      • get the shape of the sequence that was marked as anomalous in the known event
      • compare this shape to aggregated signal of current datarun using a specified method (e.g. DTW) and check if some subsequence is significantly closer than others
      • if there is a similar sequence, add an event with source 'shape matching' and a corresponding annotation tag that is similar to the tag of the original event
      • if there is any anomaly that was found in the current datarun, which overlaps with the known event, remove it from the list of found anomalies
    • add all remaining found anomalies as an event with source 'orion'

@sarahmish came up with a first skeleton of how we could implement that in Orion:

from orion.explorer import OrionExplorer
​
class OrionFeedback:
	""" this class manages the annotated events for a specified signal from 
	MTV and incorperates them back into Orion.
​
	should this be inhereted from the Orion Explorer?
	"""
​
	def execute_feedback(self, datarun, signal_id):
		""" this is the main method for feedback.
​
		this method loads the signal specified, fetches its known events,
		applies shape matching to the known events, then returns a resolved 
		list of labels for the signal.
​
		Attributes:
			- datarun: the datarun with the annotations
			- signal_id: the specific signal in the datarun which we are executing feedback
		"""
​
		signal = self.get_signal(signal_id)
		known_events = self.get_known_events(datarun, signal_id)
​
		for event in known_events:
			matched_events = self.shape_matching(signal, event)
			if self.overlap(matched_events, known_events):
				# priority: user > orion
				#			user > shape_matching
				#			shape_matching > orion
				#
				# if same priority: 
				#			user ? user
				#			shape_matching.match_score ? shape_matching.match_score
				# 			keep the higher score
				#
				# or have overlap favour anomalies over normal labels generally.
				pass
​
		return matched_events + known_events
​
	def shape_matching(self, signal, segment, method="dtw"):
		""" this methods returns the similarity score between signal and segment
​
		Attribute:
			- signal: is the signal data
			- segment: is the segment we want to match
			- method: is the algorithm used
		"""
		pass
​
	def overlap(self, shapes):
		""" this methods returns shapes after removing overlapped components, by keeping
		the higher scored ones
​
		Attribute:
			- shapes: a list of tuples, where a tuple is composed of (value, score)
		"""
​
		# remove overlap
		found.sort(key=lambda x: x["cost"])
		no_overlap = []
​
		for first_shape in found:
		    first_range = range(first_shape["id"], first_shape["id"]+window)
		    
		    flag = True
		    for second_shape in no_overlap:
		        second_range = range(second_shape["id"], second_shape["id"]+window)
		        
		        xs = set(first_range)
		        if len(xs.intersection(second_range)) > 0:
		            flag = False
		            
		    if flag:
		        no_overlap.append(first_shape)
​
	# helper functions
	def get_signal(self, signal):
		""" get signal from mongodb
		"""
​
		# exists in OrionExplorer
		pass
​
	def get_pipeline(self, pipeline):
		""" get pipeline from mongodb
		"""
​
		# exists in OrionExplorer
		pass
​
	def get_known_events(self, signal):
		""" get registered events for a particular signal in a given datarun 
		from mongodb
		"""
		pass

Some points that should be discussed:

  • Should this class be a subclass of the OrionExplorer, since it requires many of the same functionality?
  • How do we handle cases where multiple users annotated a sequence with different labels?
  • Should we use raw or aggregated signals for the shape matching?
  • What methods can be used for the shape matching besides DTW? Based on user annotations, can we use supervised (and maybe online) Machine Learning methods for subsequence classification?
@sarahmish
Copy link
Collaborator

By sequential order:

  • the feedback class benefits a lot from the functionality of OrionExplorer (almost all the helper functions). However, In the proposed form of the class, we treat feedback as merely a general function that binds together existing methods (except shape matching). Expanding on the same point, since shape matching is also required by MTV, maybe we should let the feedback be a part of the OrionExplorer.

  • if such a conflict occurs, we can handle the case in three ways:

    • handle each user separately without merging, this will return a little information for Orion.
    • use all annotations by users and take the most severe case to be the "true" case.
    • use all annotations by users whilst having a ranking for users; e.g. if user A has a higher rank than user B, and the annotations are in contradictory form, the feedback will use the annotation made by user A.
  • this is a valid concern in general, if we implement a shape matching algorithm on the raw data whilst displaying the aggregated version, the user could be confused. Although, applying shape matching on raw data could provide interesting results.

  • I think a basic shape matching algorithm will suffice in this phase of the feedback; eventually when the model is executed again, it should take the pattern of the newly labelled data into consideration and marking other similar shapes with the same tag. I believe another model in this part will increase the error rate of the consequent model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants