Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Allow users to annotate ML anomaly results #33376

Closed
droberts195 opened this issue Sep 4, 2018 · 16 comments
Closed

[ML] Allow users to annotate ML anomaly results #33376

droberts195 opened this issue Sep 4, 2018 · 16 comments
Labels

Comments

@droberts195
Copy link
Contributor

droberts195 commented Sep 4, 2018

It would be nice if a user who knows the reason for an interesting feature in the ML results to be able to annotate this.

To enable this we could add a new annotation result type, similar to this:

    {
      "job_id": "it-ops-metrics",
      "result_type": "annotation",
      "timestamp": 1454944200000,
      "end_timestamp": 1454946000000,
      "annotation": "Datacenter was isolated for failover testing",
      // The following is optional
      "detector_index": 1
    }

By making the detector index optional, the annotation can apply to either the whole job or just a specific detector.

Unlike other ML results, instead of timestamp and bucket_span annotations have timestamp and end_timestamp so that the annotation can span an arbitrary time period.

Originally it was thought that the same functionality could be used by the ML C++ code to add reasons why it created results, but the current thinking is that it is better to have separate functionality for the two use cases, hence elastic/ml-cpp#197 has been raised to discuss labelling by the C++ code.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@tveasey
Copy link
Contributor

tveasey commented Sep 4, 2018

We should think about structured input the user can provide here as well, the simplest thing would be something to the effect of "this is important" or "this is not important". It is much harder to infer things from free text input if we start using this to provide supervision to our models (although having both is no bad thing).

@richcollier
Copy link
Contributor

maybe this is also a solution for this ER? https://github.com/elastic/enhancements/issues/3322

@droberts195
Copy link
Contributor Author

We should think about structured input the user can provide here as well

We talked about this feature in the roadmap discussion in Dublin and decided that initially this will just be annotations by humans for human consumption.

If we ever add user feedback to provide supervision to our models then we can either extend the format or add a new result type to store this feedback separately.

@droberts195
Copy link
Contributor Author

droberts195 commented Nov 6, 2018

Some possibilities based on a discussion with @peteharverson are:

  1. Allow global annotations, i.e. no job_id field
  2. Allow annotations that apply to multiple jobs, i.e. job_id is an array with multiple values
  3. Allow annotations that apply to a job group, i.e. allow job_group to be specified instead of job_id

If we allow groups we need to decide how and when they'll be expanded to specific job IDs.

This makes the deletion logic quite complicated - annotations should be deleted when all jobs they relate to are deleted, but the wide variety of ways of specifying which jobs an annotation relates to makes determination of this complex.

@sophiec20
Copy link
Contributor

I wonder if detector_index might also be a optional one?

@droberts195
Copy link
Contributor Author

I wonder if detector_index might also be a optional one?

We discussed this and decided to have detector_index as the only option, instead of by/partition field values.

@droberts195
Copy link
Contributor Author

We also decided that it would be useful to have persistent annotations, that don't get deleted if all the jobs they relate to get deleted. This would be useful for the case of repeatedly deleting and recreating a job to try out slightly different configurations.

So there should be an is_persistent boolean field in the annotations too.

@droberts195
Copy link
Contributor Author

The other thing we discussed is editing annotations. It will be incredibly frustrating for users if they are not allowed to edit annotations, for example, after making a typo.

This leads on to the question of who can edit which annotations. Ideally there would be object level security such that each user could only edit annotations they added, unless the owning user opened up the permissions to more users. However, we do not currently have object level security. (The same problem applies to job configs for example.)

We should not introduce a form of object level security purely for annotations. Therefore we decided that if you can edit annotations you can edit any annotations.

There is also the question of whether creating and editing annotations should be an admin or user level privilege. Most write actions require admin privileges. However, that would cripple annotations as it's likely they'll be most useful to people looking at the results, and these people do not require admin privileges. Therefore we decided that the machine_learning_user role should allow adding and editing of annotations. (The actions could still be admin actions, but the machine_learning_user role definition should incorporate them. Alternatively the actions could be "monitor" actions, in which case no changes would be required to the definition of machine_learning_user. This is a low level implementation decision that can be deferred.)

@walterra
Copy link

walterra commented Nov 16, 2018

Do we want to support additional metadata similar to timestamp/end_timestamp, for example a value range? This would allow annotations based on time and value like in this demo:

annotate-01

@droberts195
Copy link
Contributor Author

for example a value range

I think this only makes sense for an annotation tied to a single job and single detector_index. Therefore the value range would have to be optional, and the UI would have to be capable of rendering annotations that didn't have a value range.

I'm not sure the extra complexity is worth it, but happy to go with other opinions on this if others think it is worthwhile.

@droberts195
Copy link
Contributor Author

droberts195 commented Nov 16, 2018

One more problem that emerged is which index to store annotations in when they apply to multiple jobs. The original description of this issue suggested simply a "new result type", but of course different jobs could be using different results indices, so which index should store an annotation that applies to multiple jobs?

Some options are:

  1. Annotations always go in .ml-anomalies-shared no matter which job(s) they relate to
  2. Annotations are considered metadata rather than results, so go in .ml-meta
  3. We have a completely new index, .ml-annotations for annotations

The third option is nice because it makes it easier to grant write access to annotations to the machine_learning_user role without giving it write access to any other type of ML internal data.

@walterra
Copy link

  • The current state of discussion is that APIs will be exposed via Kibana only, everything necessary to be able to view/add/edit annotations from within the UI should be doable via Kibana without requiring any custom endpoints on the ML java plugin side.
  • However, automatically generated annotations might be done via the java plugin, it's just that this functionality is internal only and doesn't need any API endpoints.
  • One thing to consider though: If we store annotations in a custom index like .ml-annotations with special requirements, then at some point an index template needs to be setup, probably via the elasticsearch plugin.

@droberts195
Copy link
Contributor Author

  • One thing to consider though: If we store annotations in a custom index like .ml-annotations with special requirements, then at some point an index template needs to be setup, probably via the elasticsearch plugin.

Yes, definitely we'll add this index in the ES code. We'll also need to add some sort of cleanup code on the ES side.

@walterra
Copy link

FYI, I wrote up some notes how TypeScript can help us with these kinds of format specifications: https://gist.github.com/walterra/1dd998cf8d14e19f7dd90a98e4d4eb71#file-typescript_interface-md

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Dec 17, 2018
The ML UI now provides the ability for users to annotate
time periods with arbitrary text to add insight to what
happened.

This change makes the backend create the index for these
annotations, together with read and write aliases to
make future upgrades possible without adding complexity
to the UI.

It also adds read and write permission to the index for
all ML users (not just admins).

The spec for the index is in
https://github.com/elastic/kibana/pull/26034/files#diff-c5c6ac3dbb0e7c91b6d127aa06121b2cR7

Relates elastic#33376
Relates elastic/kibana#26034
droberts195 added a commit that referenced this issue Dec 18, 2018
The ML UI now provides the ability for users to annotate
time periods with arbitrary text to add insight to what
happened.

This change makes the backend create the index for these
annotations, together with read and write aliases to
make future upgrades possible without adding complexity
to the UI.

It also adds read and write permission to the index for
all ML users (not just admins).

The spec for the index is in
https://github.com/elastic/kibana/pull/26034/files#diff-c5c6ac3dbb0e7c91b6d127aa06121b2cR7

Relates #33376
Relates elastic/kibana#26034
droberts195 added a commit that referenced this issue Dec 18, 2018
The ML UI now provides the ability for users to annotate
time periods with arbitrary text to add insight to what
happened.

This change makes the backend create the index for these
annotations, together with read and write aliases to
make future upgrades possible without adding complexity
to the UI.

It also adds read and write permission to the index for
all ML users (not just admins).

The spec for the index is in
https://github.com/elastic/kibana/pull/26034/files#diff-c5c6ac3dbb0e7c91b6d127aa06121b2cR7

Relates #33376
Relates elastic/kibana#26034
@droberts195
Copy link
Contributor Author

The first cut of annotations is in 6.6. Separate issues can be raised for future enhancements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants