Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[eventLog] prevent log writing when initialization fails #71339

Merged
merged 2 commits into from
Jul 14, 2020

Conversation

pmuellr
Copy link
Member

@pmuellr pmuellr commented Jul 9, 2020

resolves #68309

Summary

Previously, if the initialization of the elasticsearch resources failed
during initialization, the event logger would still try to write events.
Which is somewhat of a catastrophic failure, as typically the logger would
try writing to the alias name, but no alias exists, so a new index would
be created with the name of the alias. Making it impossible to initialize
successfully later until that index was deleted.

The core initialization calls already returned success indicators, so this
PR just responds to those and prevents the logger from writing to the index
if intialization failed.

Checklist

Delete any items that are not applicable to this PR.

@pmuellr pmuellr added Feature:Alerting v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.9.0 labels Jul 9, 2020
resolves elastic#68309

Previously, if the initialization of the elasticsearch resources failed
during initialization, the event logger would still try to write events.
Which is somewhat of a catastrophic failure, as typically the logger would
try writing to the alias name, but no alias exists, so a new index would
be created with the name of the alias.  Making it impossible to initialize
successfully later until that index was deleted.

The core initialization calls already returned success indicators, so this
PR just responds to those and prevents the logger from writing to the index
if intialization failed.
@pmuellr pmuellr force-pushed the eventLog/init-failure-disable branch from 6a5ffab to bd72953 Compare July 13, 2020 15:04
@pmuellr pmuellr marked this pull request as ready for review July 13, 2020 15:38
@pmuellr pmuellr requested a review from a team as a code owner July 13, 2020 15:38
Copy link
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, but I do wonder if we should be throwing rather than making it a NOOP.

@@ -17,7 +17,7 @@ const createContextMock = () => {
logger: loggingSystemMock.createLogger(),
esNames: namesMock.create(),
initialize: jest.fn(),
waitTillReady: jest.fn(),
waitTillReady: jest.fn(async () => Promise.resolve(true)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro nit.
Isn't this the same as... ?

Suggested change
waitTillReady: jest.fn(async () => Promise.resolve(true)),
waitTillReady: jest.fn(async () => true),

If so, it just seems a little easier to understand and maintain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should try putting that back to your suggesting - I had some issues trying to get it to fire, so tried "the long way" so I could debug it. thx!

Comment on lines +186 to +191
const success = await esContext.waitTillReady();
if (!success) {
esContext.logger.debug(`event log did not initialize correctly, event not written`);
return;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder - we will now be relying on the Event Log not just as a log, but as an operational data source for our product (alert instances over time, their values etc.).

Perhaps failing to index a log event should be a failing operation rather than a NOOP? 🤔

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmuellr
Copy link
Member Author

pmuellr commented Jul 13, 2020

from @gmmorris #71339 (review)

This looks good overall, but I do wonder if we should be throwing rather than making it a NOOP.

Not sure of the exact context here, but guessing you're referring to the top-level logEvent() call.

Currently the logEvent() method that is used by actions/alerts/etc to log an event is basically async, even though the signature is that it's sync. We setImmediate() internally to get it to go async. So if we threw out of that, there's no one to catch it! The idea is to make it as easy as possible for clients to log events, and not have to worry about the background machinery like the async bits or initialization errors.

also from Gidi #71339 (comment)

This makes me wonder - we will now be relying on the Event Log not just as a log, but as an operational data source for our product (alert instances over time, their values etc.).
Perhaps failing to index a log event should be a failing operation rather than a NOOP? 🤔

Fair question, especially if/when we become more dependent on event log data as a source of truth. But a change in the basic assumptions event log was built with. Worth an issue to discuss I think. We might even want to force the event log initialization during start and have Kibana fail to start if the EL initialization fails, I guess as the most severe form of this.

I guess I've been trying to keep in my mind that EL provides a lot of useful information, but we need to be careful about being too dependent on it, especially since the user is control over the lifetime of the data, by editing the ILM policy. It's also possible to soft-disable the indexing of events completely, via a config value in EL.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pmuellr pmuellr merged commit 67f466a into elastic:master Jul 14, 2020
pmuellr added a commit to pmuellr/kibana that referenced this pull request Jul 14, 2020
resolves elastic#68309

Previously, if the initialization of the elasticsearch resources failed
during initialization, the event logger would still try to write events.
Which is somewhat of a catastrophic failure, as typically the logger would
try writing to the alias name, but no alias exists, so a new index would
be created with the name of the alias.  Making it impossible to initialize
successfully later until that index was deleted.

The core initialization calls already returned success indicators, so this
PR just responds to those and prevents the logger from writing to the index
if initialization failed.

# Conflicts:
#	x-pack/plugins/event_log/server/es/context.test.ts
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jul 14, 2020
* master: (21 commits)
  [Maps] 7.9 design improvements (elastic#71563)
  [ML] Changing all calls to ML endpoints to use internal user (elastic#70487)
  [eventLog] prevent log writing when initialization fails (elastic#71339)
  [Observability] landing page always being displayed (elastic#71494)
  [IM] Address data stream copy feedback (elastic#71615)
  [Logs UI] Anomalies page dataset filtering (elastic#71110)
  [data.search.aggs] Remove `use_field_mapping` from top hits agg (elastic#71168)
  [ML] Anomaly swim lane embeddable navigation and filter actions (elastic#71082)
  Fixes typo in siem_cloudtrail job description (elastic#71569)
  Require granted API Keys to have a name (elastic#71623)
  Update  getUsageForCollection (elastic#71609)
  Only fetch saved elements once (elastic#71310)
  [SecuritySolution][Resolver] Adding siem index and guarding process ancestry (elastic#71570)
  [APM] Additional data telemetry changes (elastic#71112)
  [Visualize] Fix export table for table export links (elastic#71249)
  [Search] Server side search API (elastic#70446)
  use inclusive language (elastic#71607)
  [Security Solution] Hide timeline footer when Resolver is open (elastic#71516)
  [Index template wizard] Remove shadow and use border for components panels (elastic#71606)
  [ML] Kibana API endpoint for histogram chart data (elastic#70976)
  ...
pmuellr added a commit that referenced this pull request Jul 14, 2020
…1662)

Previously, if the initialization of the elasticsearch resources failed
during initialization, the event logger would still try to write events.
Which is somewhat of a catastrophic failure, as typically the logger would
try writing to the alias name, but no alias exists, so a new index would
be created with the name of the alias.  Making it impossible to initialize
successfully later until that index was deleted.

The core initialization calls already returned success indicators, so this
PR just responds to those and prevents the logger from writing to the index
if initialization failed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Alerting] event log is not "disabled" when initialization fails
4 participants