[Meta] Logging Projects #60391

joshdover · 2020-03-17T15:38:25Z

⚠️ This issue is WIP and not yet complete ⚠️

This issue is intended to be the source-of-truth for all things logging currently being planned or worked on in Kibana.

Some issues belong to multiple categories or tracks of work here, so there is some duplication.

Logging Projects

Kibana Platform Logger

The Kibana Platform (aka "New Platform") has a new logger and configuration that more closely matches Elasticsearch's usage of log4j. You can read more about its design here.

Related issues:

Audit Logging

Audit logging is a security feature. While audit logging already exists in Kibana, it is currently quite limited. This is a high priority project for Security across the Stack. The current plan is to build new audit logging features on top of the Kibana Platform logger.

Kibana Audit Logging Proposal

Related issues:

[Meta] Audit Logging [Meta] Audit Logging #52125

Blockers for security to begin building audit logging:

Pipe X-Opaque-Id header to AuditTrail logs and Elasticsearch API calls Pipe X-Opaque-Id header to AuditTrail logs and Elasticsearch API calls #62018
Add a generic AuditTrail service Add generic AuditTrail service #60119
- Needs to include execution context, be able to create a scoped audit logger from a request or "fake request" for background tasks. Can improve later with more enriched data (like username) for background tasks.
Allow plugins to modify logging config (Allow plugins to extend logging config #61976)
- One option would be to have a separate config file (this is what ES does right now). Need to sync with Cloud on if this is preferred.
Use Elastic Common Schema in json layout [logging] Use Elastic Common Schema (ECS) #52226

Future needs:

Allow plugins to register custom logging appenders Allow plugins to register custom logging appenders #53256
Add execution context to logger (Add execution context to log records #60122)

Alert History Log (Event Log)

The Alerting team has built a event_log plugin for recording specific events into a separate Elasticsearch index. Alerts do not currently integrate with this plugin yet, but it is planned in the near-term.

Related issues:

Alert history [Metrics Alerts][discuss] Alert History #58295
Add event log entries for alert execution and scheduling [alerting event log] add event log for alert execution and alerts scheduling actions #55636
Add query support for event log [alerting event log] add query support #55633
Buffer events being written instead of writing when logged [alerting event log] buffer events being written instead of writing when logged #55634

Log Ingestion

Monitoring is moving to metricbeat for ingestion of monitoring data to Elasticsearch. Operations would like to do the same with filebeat for ingesting normal logs into Elasticsearch.

It may also make sense to use filebeat for alerting's history log / event log as well. It solves many of the problems that event log does not currently handle (buffering, exponential backoff, etc.)

If using filebeat fulfills requirements for both use cases, it may make sense to actually combine these into a single log and/or single index in Elasticsearch and filter for each use case at query-time.

Pending Decisions

There are a number of decisions that affect more than one of these projects and need to be made in order to unblock them:

Should filebeat be used for normal logs, audit logs, and alerting history logs?
- If so, should these logs have separate indices in Elasticsearch?
  - Each of these features follows different privilege models and having separate indices may make that simpler to enforce. However, it adds some additional complexity to Kibana installations and upgrades.
How important is it that all logging features go through the same or similar mechanisms?
- Is the Platform logger flexible enough to support all these use cases?

The text was updated successfully, but these errors were encountered:

joshdover · 2020-03-18T15:54:00Z

If so, should these logs have separate indices in Elasticsearch?

I've been thinking about this more, and I think there is quite a bit of value from having separate indices for different use cases. Other than the security issue, we also can have different retention policies per use case. I think it's very likely that we'll want shorter retention for Alerting history than we would want for Audit logs.

If all indices use the same prefix .kibana I believe we are covered by the existing index permissions granted to the kibana_system role.

The only real downside I can think of is the operation complexity added by needing to migrate or reindex multiple indices for major version upgrades. However, if all of these indices are using ECS, I believe schema migrations will be quite rare. This reduces the risk of failed upgrades quite a bit.

One outstanding question is whether or not these indices need to be "system indices" that are hidden from the user? If so, how much work is needed to support them in the Elasticsearch plugin?

@tylersmalley are there any other concerns that I am not thinking of?

elasticmachine · 2020-03-18T19:59:31Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

elasticmachine · 2020-03-18T19:59:32Z

Pinging @elastic/kibana-operations (Team:Operations)

elasticmachine · 2020-03-18T19:59:34Z

Pinging @elastic/kibana-security (Team:Security)

elasticmachine · 2020-03-18T19:59:36Z

Pinging @elastic/kibana-platform (Team:Platform)

kobelb · 2020-03-18T21:14:10Z

Should filebeat be used for normal logs, audit logs, and alerting history logs?

I think we should be using filebeat for normal logs and audit logs, but not for populating the "alert history" that will show up integrated within the Alerting application. From my understanding, users should be granted access to the history for an alert if they have access to the alert itself, so these should be stored in "system indices". Is there some benefit that I'm missing from using Filebeat to create these entries as opposed to having Alerting itself insert these documents?

One outstanding question is whether or not these indices need to be "system indices" that are hidden from the user? If so, how much work is needed to support them in the Elasticsearch plugin?

For the normal logs, and the audit logs I think these should be stored either in "hidden indices" or "data indices". Users should be granted access to them using normal Elasticsearch index privileges, and they should be available for use within applications like Discover, Dashboard, Visualize, Logging, etc. As far as I'm aware, data is stored in "system indices" won't ever be accessible directly to end-users using the normal Elasticsearch document APIs.

If we want to ship Filebeat with Kibana, and automatically start shipping the logs to Elasticsearch without any user intervention, we should use "hidden indices" as we can make the assumption that the end-user isn't using them for something different.

If we optionally want to start shipping the logs to Elasticsearch with some user intervention, we can use "hidden indices" as well. However, we could also potentially use "data indices" and allow the end-user to specify the indices this data should be ingested into.

jbudz · 2020-03-18T21:52:23Z

cc @pmuellr for the alerting question above

joshdover · 2020-03-18T22:37:14Z

Is there some benefit that I'm missing from using Filebeat to create these entries as opposed to having Alerting itself insert these documents?

The only benefits I'm aware of are the ones I listed above in the ingestion section:

[Filebeat] solves many of the problems that event log does not currently handle (buffering, exponential backoff, etc.).

We get some benefits out of the box, however there are some drawbacks. For instance, it may be tricky to present feedback to the alerting plugin on the status of history ingestion.

That said, I'm supportive of treating alerting history differently. It seems to have enough different requirements to not be using the same mechanics, at least for the time being. Maybe later down the line it makes sense to consolidate these systems, but I don't think we're there yet.

If we optionally want to start shipping the logs to Elasticsearch with some user intervention, we can use "hidden indices" as well. However, we could also potentially use "data indices" and allow the end-user to specify the indices this data should be ingested into.

I think what is key here is that we don't need to start ingesting logs right away for the Audit logging MVP. That can be an add on feature in the future.

There's also some UI features we've talked about using log data for but I think we should evaluate those from first principles. For example, SO history should probably be part of a larger versioning feature rather than just showing edit log events in the UI.

I think separating these efforts from the start, and then identifying overlap later may be the quicker path to delivering on all of these fronts. There are some obvious things that should share infrastructure and we should leverage those when we can, but I don't want to serialize everything artificially if it does not provide significant value ~~right now~~ in the short term.

kobelb · 2020-03-18T22:50:27Z

I think separating these efforts from the start, and then identifying overlap later may be the quicker path to delivering on all of these fronts. There are some obvious things that should share infrastructure and we should leverage those when we can, but I don't want to serialize everything artificially if it does not provide significant value ~~right now~~ in the short term.

I think this is a good path forward. You brought up some good points regarding the buffering and exponential backoffs which Filebeat has implemented. I don't want to be entirely glossing over them. @elastic/kibana-alerting-services do you know how many documents we're talking about being created every time that an alert runs?

legrego · 2020-03-19T10:14:03Z

If all indices use the same prefix .kibana I believe we are covered by the existing index permissions granted to the kibana_system role.

Is the plan to have filebeat authenticate as the kibana_system user? I feel like that role is overly permissive for what filebeat requires. We might want to consider creating another user/role which is only able to append to these indices (via the create_doc index privilege). I think it's important to constrain the types of operations we are authorized to do. create_doc will allow us to ingest our logs, but prevent both updates and deletes.

mshustov · 2020-03-19T15:11:20Z

Blockers for security to begin building audit logging:
Allow plugins to register custom logging appenders #53256

@elastic/kibana-security are you going to use a custom appender?

Elasticsearch query log #58086

I'm not sure it's a blocker for Audit Logging. Does the Audit Logger depend on built-in ES logging format? They overlap, but they have different requirements and output format.

@joshdover
Shouldn't we add to the blocker list the next issues:

add Add execution context to log records Add execution context to log records #60122
plugins can initiate Kibana graceful shutdown Plugins can initiate Kibana graceful shutdown #60636

legrego · 2020-03-19T18:16:30Z

@elastic/kibana-security are you going to use a custom appender?

To clarify, an appender is used to direct logs to a specific output (such as a file), right? Are we allowed to create our own instances of the built-in appenders? If so, that might be sufficient. We could direct audit logs to a specific file appender (separate from the "main" file appender the rest of Kibana is using).

I'm not sure it's a blocker for Audit Logging. Does the Audit Logger depend on built-in ES logging format? They overlap, but they have different requirements and output format.

Not that I'm aware of. @jportner what do you think?

jportner · 2020-03-19T19:20:06Z

I'm not sure it's a blocker for Audit Logging. Does the Audit Logger depend on built-in ES logging format? They overlap, but they have different requirements and output format.

Not that I'm aware of. @jportner what do you think?

I don't think that's a blocker for security audit logging. We could certainly augment security audit logs if additional performance data was available, but that's not part of our MVP.

joshdover · 2020-03-19T19:54:10Z

Allow plugins to register custom logging appenders #53256

I was assuming that an extended version of the ECS layout may be necessary for adding the extra ECS fields. It's possible we can make the regular ECS layout support everything, but the data is just only populated in the Audit log records.

Elasticsearch query log #58086

I'm not sure it's a blocker for Audit Logging. Does the Audit Logger depend on built-in ES logging format? They overlap, but they have different requirements and output format.

Good point, but audit logging will need the audit events to be emitted from the ES service. We should have separate issues for those.

Add execution context to log records Add execution context to log records #60122

👍

plugins can initiate Kibana graceful shutdown Plugins can initiate Kibana graceful shutdown #60636

I don't think this is a hard requirement for the first phase Audit Logging, but will be needed in the second phase.

mshustov · 2020-03-20T11:33:54Z

To clarify, an appender is used to direct logs to a specific output (such as a file), right? Are we allowed to create our own instances of the built-in appenders? If so, that might be sufficient. We could direct audit logs to a specific file appender (separate from the "main" file appender the rest of Kibana is using).

I thought we decided to configure an existing File / LogRotation appenders for this.
Wouldn't NP Logging hierarchical model allow us to achieve log piping to the desired destination without introducing a new API?

I was assuming that an extended version of the ECS layout may be necessary for adding the extra ECS fields. It's possible we can make the regular ECS layout support everything, but the data is just only populated in the Audit log records.

Yes, Layout defines the output format, not the content. So but the data is just only populated in the Audit log records. sounds like a right move. And to note again: In Elasticsearch JSON layout follows the ECS format by default. We should refactor the existing JSON layout to ensure compatibility across the stack.

pmuellr · 2020-03-23T19:29:08Z

Some thoughts on the alerting event log, based on conversations above.

I think there is quite a bit of value from having separate indices for different use cases.

Ya, probably. I was thinking the shape of the logging service might be like saved objects, where I can create a separate ES index to serve as the storage, but reuse high-level APIs in the logging service. Even if alerting event log doesn't end up needing a separate index, not hard to imagine some other solution wanting one in the future. Something to keep in mind anyway.

Should filebeat be used for normal logs, audit logs, and alerting history logs?

One of the design alternatives for the event log was to write to a file and ingest w/filebeat, and I think this could be made to work. I ruled it out as the first version of our log as I didn't want to be the plugin that added a 10MB binary to Kibana :-)

The main reason to use filebeat is to get our logger out of the business of buffering log events, in case ES goes down. But that's about it. Our planned story is to buffer a small (eg, 100) events in memory, throwing out the oldest events when the buffer gets full. So, kinda lossy. OTOH, if ES is actually down, then alerts and actions aren't going to be running either, so it's not even really clear we'd need this buffer for issues with ES being down. I'm expecting the primary benefit for the internal buffering write events is that we can bulk write them (every couple of seconds) rather than doing write-per-event.

I think alerting would be perfectly fine having a nice filebeat ingestion story, once we get there, and can get by with our current "write log entries with JS calls" till then.

The only real downside [of using separate ES indices] I can think of is the operation complexity added by needing to migrate or reindex multiple indices for major version upgrades.

I think the idea of migrating logging indices, or even re-indexing, is something we want to avoid. We are currently going with a story where we create per-stack-version indices, eg, .kibana-event-log-8.0.0-000001 (the -000001 suffix is an ILM thing) - copying what APM is currently doing. And there's an assumption that we will keep the schemas compatible enough that searching across old indices should work, except for searches that might contain new data added to new versions of the log (ie, only add new fields, never change them or delete them). Worse case is that we'd need to introspect a bit on existing logs, look at their verions and date ranges of the data contained in them, and use that to create elaborate searches, or do separate (clunky) searches across the different versions, and join ourselves (not great, but for time series data, probably ok).

From my understanding, users should be granted access to the history for an alert if they have access to the alert itself, so these should be stored in "system indices".

Probably true that they should be in system indices, however we're currently going to be in crunch mode for 7.8, where we REALLY need to have some amount of the event log operational, and it doesn't feel like we could ship this as system indices for 7.8.

System indices seems like the right way to go, long-term. Purpose-built API would be nice. We'd need to figure out some kind of "ILM"-ish story - could be pretty simple like a map of time durations and states - "warm storage after 1 days, cold storage after 1 week, delete after 2 weeks" kinda thing. We'd manage an actual ILM policy from a constrained API surface we exposed to users.

I think separating these efforts from the start, and then identifying overlap later may be the quicker path to delivering on all of these fronts.

Concur. We're on a tight schedule for 7.8 anyway, so seems unlikely we'd have a generic logger fully operational by then, that would also be suitable for alerting.

We should start thinking about what it would take to converge these separate efforts (or maybe just alerting and everything else) into a single story. And I think the "hardest" part will be converging on an extended ECS schema for Kibana. We probably want to start thinking about what our extension story is, we can make sure it will work for alerting, etc. We already have a few Kibana extensions in our small ECS subset:

kibana/x-pack/plugins/event_log/generated/mappings.json

Lines 64 to 94 in 452193f

    
           "kibana": { 
        
               "properties": { 
        
                   "server_uuid": { 
        
                       "type": "keyword", 
        
                       "ignore_above": 1024 
        
                   }, 
        
                   "namespace": { 
        
                       "type": "keyword", 
        
                       "ignore_above": 1024 
        
                   }, 
        
                   "saved_objects": { 
        
                       "properties": { 
        
                           "store": { 
        
                               "type": "keyword", 
        
                               "ignore_above": 1024 
        
                           }, 
        
                           "id": { 
        
                               "type": "keyword", 
        
                               "ignore_above": 1024 
        
                           }, 
        
                           "type": { 
        
                               "type": "keyword", 
        
                               "ignore_above": 1024 
        
                           } 
        
                       }, 
        
                       "type": "nested", 
        
                       "dynamic": "strict" 
        
                   } 
        
               }, 
        
               "dynamic": "strict" 
        
           }

Note that the saved_objects property is intended to be a primary search key through the event log - you would typically only be able to see the history of an alert/action if you can "SEE" the alert/action (security-wise), and so have access to the saved object type/id. We're not yet sure if we really need multiple of these (hence the nested type), and if we do need multiple, could we simplify this down to a string type (url-ification of saved object references: space/type/id kinda thing).

@elastic/kibana-alerting-services do you know how many documents we're talking about being created every time that an alert runs?

How long is a piece of string? Users could have 1000's of alerts that go off once per second, each scheduling 1000's of actions to run. Or just a handful. No idea, really. There are some known customers who use a lot of ES watches, we've been keeping those in mind in terms of needing to support that kind of scale. In theory, alerting will be "simpler" for customers to use than watcher, so you'd think we'll probably have customers creating more alerts than they created watches.

gmmorris · 2020-04-08T16:54:52Z

Hey,
Just pinging here as we're making a change in our top level kibana object which @pmuellr describes above.

We're changing the saved_object object so that each SO can have their own namespace in preparation for #27004

You can see this change in the PR here:

https://github.com/elastic/kibana/blob/a4f93abb557f4b2f2700271c32ef982f6b891fc4/x-pack/plugins/event_log/generated/mappings.json#L69-L104

We wanted to make sure this is visible in here in case our top level kibana key potentially clashes with the work being done for the Kibana log ECS usage.

pmuellr · 2020-06-09T13:49:12Z

One of the things I want to look into for the event log used by alerting, is the new data streams support. I'm guessing the other logging uses referenced here aren't at the point of needing to think about this yet, but figured I'd mention it, see if anyone else is looking into this.

The driver for using data streams for the event log are to make it easier to describe the relationship between the indices, aliases, templates and ILM policies. It turns out to be tricky to get these to all work together completely reliably, would be nice to get that additional reliability. Presumably, it also has some performance benefits for both queries and maybe writes.

We'd target supporting this at a minor version level, as we currently have version-specific ES resources, so a new minor version can completely change the underlying implementation of these sorts of bits.

elasticmachine · 2022-01-31T17:12:03Z

Pinging @elastic/response-ops (Team:ResponseOps)

lukeelmers · 2022-06-20T22:32:17Z

This issue hasn't been active in 2 years, and most of the items are completed, so I'll go ahead and close it.

If anyone feels we still need it, feel free to reopen. ❤️

joshdover added Meta enhancement New value added to drive a business result labels Mar 18, 2020

pmuellr mentioned this issue Apr 8, 2020

Event log to store namespace per saved object #62672

Closed

joshdover mentioned this issue Nov 25, 2020

[Breaking change] Logging configuration format change #84363

Closed

legrego removed the Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! label Aug 3, 2021

tylersmalley added 1 and removed 1 labels Oct 11, 2021

exalate-issue-sync bot added the impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. label Oct 12, 2021

exalate-issue-sync bot added the loe:small Small Level of Effort label Oct 12, 2021

Mpdreamz mentioned this issue Oct 14, 2021

[Breaking change] Switch logging to ECS-JSON layout by default #114968

Closed

tylersmalley added the EnableJiraSync label Oct 14, 2021

kobelb added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) and removed Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jan 31, 2022

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

tylersmalley removed loe:small Small Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. EnableJiraSync labels Mar 16, 2022

lukeelmers closed this as completed Jun 20, 2022

tylersmalley removed the Team:Operations Team label for Operations Team label Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta] Logging Projects #60391

[Meta] Logging Projects #60391

joshdover commented Mar 17, 2020 •

edited by pgayvallet

Loading

joshdover commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

kobelb commented Mar 18, 2020

jbudz commented Mar 18, 2020

joshdover commented Mar 18, 2020 •

edited

Loading

kobelb commented Mar 18, 2020

legrego commented Mar 19, 2020

mshustov commented Mar 19, 2020

legrego commented Mar 19, 2020

jportner commented Mar 19, 2020

joshdover commented Mar 19, 2020

mshustov commented Mar 20, 2020

pmuellr commented Mar 23, 2020

gmmorris commented Apr 8, 2020 •

edited

Loading

pmuellr commented Jun 9, 2020

elasticmachine commented Jan 31, 2022

lukeelmers commented Jun 20, 2022

[Meta] Logging Projects #60391

[Meta] Logging Projects #60391

Comments

joshdover commented Mar 17, 2020 • edited by pgayvallet Loading

Logging Projects

Kibana Platform Logger

Audit Logging

Alert History Log (Event Log)

Log Ingestion

Pending Decisions

joshdover commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

elasticmachine commented Mar 18, 2020

kobelb commented Mar 18, 2020

jbudz commented Mar 18, 2020

joshdover commented Mar 18, 2020 • edited Loading

kobelb commented Mar 18, 2020

legrego commented Mar 19, 2020

mshustov commented Mar 19, 2020

legrego commented Mar 19, 2020

jportner commented Mar 19, 2020

joshdover commented Mar 19, 2020

mshustov commented Mar 20, 2020

pmuellr commented Mar 23, 2020

gmmorris commented Apr 8, 2020 • edited Loading

pmuellr commented Jun 9, 2020

elasticmachine commented Jan 31, 2022

lukeelmers commented Jun 20, 2022

joshdover commented Mar 17, 2020 •

edited by pgayvallet

Loading

joshdover commented Mar 18, 2020 •

edited

Loading

gmmorris commented Apr 8, 2020 •

edited

Loading