Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rolling-file appender to core logging #84735

Merged
merged 28 commits into from
Dec 10, 2020

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Dec 2, 2020

Summary

Fix #56291

This PR adds a new rolling-file appender to core's logging system.

The appender currently supports two triggering policy: time-based (time-interval) and size-base (size-limit), and one rolling strategy: appending a numeric-based suffix at the end of the file when rolling it (numeric). See src/core/server/logging/README.md in the PR for more documentation

Examples

Rolling every 24h

logging:
  appenders:
    lets-roll:
      kind: rolling-file
      path: /var/logs/kibana.log
      policy:
        kind: time-interval
        interval: 24h
        modulate: true
      strategy:
        kind: numeric
        pattern: '-%i'
        max: 2
      layout:
        kind: pattern

This appender is configured to roll the file every 24h hours (interval: 24h) everyday at 0:00am (modulate: true). it will retain a maximum of 2 (max: 2) rolled files:

  • During the first rollover, kibana.log is renamed to kibana-1.log. A new kibana.log file is created and starts
    being written to.
  • During the second rollover, kibana-1.log is renamed to kibana-2.log and kibana.log is renamed to kibana-1.log.
    A new kibana.log file is created and starts being written to.
  • During the third and subsequent rollovers, kibana-2.log is deleted, kibana-1.log is renamed to kibana-2.log and
    kibana.log is renamed to kibana-1.log. A new kibana.log file is created and starts being written to.

Rolling once the file reaches 50mb

logging:
  appenders:
    rock-n-roll:
      kind: rolling-file
      path: /var/logs/kibana.log
      policy:
        kind: size-limit
        size: 50mb
      strategy:
        kind: numeric
        pattern: '-%i'
        max: 2
      layout:
        kind: pattern

This appender is configured to roll the file once it reaches 50mb. The same rolling strategy than previous example applies.

Checklist

@pgayvallet pgayvallet added this to Pending Review in kibana-core [DEPRECATED] via automation Dec 2, 2020
@pgayvallet pgayvallet moved this from Pending Review to In Progress in kibana-core [DEPRECATED] Dec 2, 2020
Copy link
Contributor Author

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-review

src/core/server/server.api.md Show resolved Hide resolved
Comment on lines 138 to 144
try {
await this.strategy.rollout();
await this.fileManager.closeStream();
} catch (e) {
// eslint-disable-next-line no-console
console.log('Error while rolling file: ', e);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm forced to try/catch this, else it could result in an unhandled promise rejection from the call in append.

I'm not sure what to do with the error though. Crashing the server if the rollout failed seems extreme, but I'm not sure console.log is enough. I could eventually keep an internal ConsoleAppender and use it to log the error instead of relying on console.log, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a failed rollout never, sometimes, or always prevent future logs from being written? My guess is sometimes?

Eventually keeping an internal ConsoleAppender sounds like a good idea to me. In the short term, I don't have many other suggestions, other than changing console.log to console.error, so that we can at least see this in stderr instead of stdout.

In the context of audit logging specifically, we've discussed optionally halting the server to prevent users from interacting with Kibana when we can't audit their actions: #60636. This is out of scope for this PR, but it might inform how we decide to handle these failures.

In the hypothetical future, we could leverage the notification center to alert administrators about this as well.

Copy link
Contributor Author

@pgayvallet pgayvallet Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a failed rollout never, sometimes, or always prevent future logs from being written?

it's never never when you interact with filesystem, but except in case of hard I/O failure, I think the worth case would be that we would not be able to roll the log file, and that the next rolling cycle would still write in the previous (non-renamed) file. We skip a rolling, basically, which seemed acceptable (also we can't really do anything except warning about the failure)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do with the error though. Crashing the server if the rollout failed seems extreme, but I'm not sure console.log is enough.

It seems Elasticsearch logs a caught exception without any additional measurements (we can further check with them). see elastic/elasticsearch#45523

src/core/server/logging/logging_system.ts Outdated Show resolved Hide resolved
src/core/server/logging/logging_system.ts Show resolved Hide resolved
@pgayvallet pgayvallet marked this pull request as ready for review December 3, 2020 19:24
@pgayvallet pgayvallet requested a review from a team as a code owner December 3, 2020 19:24
@pgayvallet pgayvallet added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Dec 3, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet pgayvallet added the release_note:skip Skip the PR/issue when compiling release notes label Dec 3, 2020
@pgayvallet pgayvallet moved this from In Progress to Pending Review in kibana-core [DEPRECATED] Dec 3, 2020
Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this new appender for Kibana's new audit log, and it's working very nicely! This is a really impressive PR, thanks again for putting this up so quickly

Comment on lines 138 to 144
try {
await this.strategy.rollout();
await this.fileManager.closeStream();
} catch (e) {
// eslint-disable-next-line no-console
console.log('Error while rolling file: ', e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a failed rollout never, sometimes, or always prevent future logs from being written? My guess is sometimes?

Eventually keeping an internal ConsoleAppender sounds like a good idea to me. In the short term, I don't have many other suggestions, other than changing console.log to console.error, so that we can at least see this in stderr instead of stdout.

In the context of audit logging specifically, we've discussed optionally halting the server to prevent users from interacting with Kibana when we can't audit their actions: #60636. This is out of scope for this PR, but it might inform how we decide to handle these failures.

In the hypothetical future, we could leverage the notification center to alert administrators about this as well.

this.currentFileTime = birthtime.getTime();
this.currentFileSize = size;
} catch (e) {
this.currentFileTime = Date.now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there only certain failure scenarios that should cause us to reset the file info like this? Or do we really want to reset whenever any error is caught?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would only catch ENOENT (or such) errors. But that's kinda the first time we got a logger/appender implementation complex enough to potentially throw errors, so I'm not really sure how we should handle failures internal to the logger system or its appenders, so for safety I used a catch-all instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would only catch ENOENT (or such) errors. But that's kinda the first time we got a logger/appender implementation complex enough to potentially throw errors,

shouldn't we log an exception different from ENOENT to console at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will add a ENOENT check and logs the error.

Remaining question (same as in #84735 (comment)) is, should we:

  1. use console.log
  2. use console.error
  3. create a ConsoleAppender in the RollingFileAppender and use it internally for all error logging (we would use the same layout as the one provided to the rolling file appender). In that case, it will logs in stdout as if we were using console.log

wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3rd option looks fine. However, I'd expect that it uses the default layout that a user specifies for the root context.
If the 3rd option is too hard to implement, the 2nd one is well enough.

export { RollingStrategy } from './strategy';
export type RollingStrategyConfig = NumericRollingStrategyConfig;

export const rollingStrategyConfigSchema = schema.oneOf([numericRollingStrategyConfigSchema]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only have a single strategy available, what do you think about providing a default value, so that consumers aren't forced to specify it? I mistakenly omitted this from my yml, and the @kbn/config-schema error message was hard to decipher at best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I even added default values for both the policy and the strategy at some point, so that

rolling-file:
    kind: rolling-file
    path: /Users/pierregayvallet/Documents/test-log/kibana.log
    layout:
        kind: pattern

would be enough, then I reverted it, thinking that we didn't do that for other appenders (like you must always specify layout, there isn't a default value for it).

@elastic/kibana-core do you have any opinion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see a reason not to add defaults. we have got them for some values

const DEFAULT_PATTERN = `[%date][%level][%logger]%meta %message`;
export const patternSchema = schema.string({
validate: (string) => {
DateConversion.validate!(string);
},
});
const patternLayoutSchema = schema.object({
highlight: schema.maybe(schema.boolean()),
kind: schema.literal('pattern'),
pattern: schema.maybe(patternSchema),
});


export const sizeLimitTriggeringPolicyConfigSchema = schema.object({
kind: schema.literal('size-limit'),
size: schema.byteSize({ min: '1b', defaultValue: '100mb' }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you discussed the defaults with Cloud? The legacy appender has other limits + we might want to have higher min and explicit max limits

// > 1MB
.greater(1048576)
// < 1GB
.less(1073741825)
// 10MB
.default(10485760),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't. @Kushmaro maybe you can help us here?

Copy link

@Kushmaro Kushmaro Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgayvallet
By our log delivery feature design (see storage section) we concluded that the smallest kibana instance in cloud will have around 1GB of storage space

is the 100mb referring to the log size rotation limit? cloud also enforces a quota on the log path, and I * think * that quota is 100mb total, so maybe a more sensible rotation limit would be 10mb.

@s-nel for awareness and further guidelines (he's on PTO though so may take a while)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the 100mb referring to the log size rotation limit

That's the maximum size of the log file before it got rolled out.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 100MB rollover size for ES audit logs and keep 2 files. We might choose something smaller for Kibana since the instances don't have much storage.

this.currentFileTime = birthtime.getTime();
this.currentFileSize = size;
} catch (e) {
this.currentFileTime = Date.now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would only catch ENOENT (or such) errors. But that's kinda the first time we got a logger/appender implementation complex enough to potentially throw errors,

shouldn't we log an exception different from ENOENT to console at least?


const logger = root.logger.get('test.rolling.file');

// size = 100b, message.length ~= 40b, should roll every 3 message
Copy link
Contributor

@mshustov mshustov Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can create a buffer of fixed size to make sure to have better control over the rolling process

const asciiBuf = Buffer.alloc(5, 'a', 'utf8');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only using ascii chars in the message, so we kinda know that sizeInBytes === msg.length, but I can use a buffer to add more control if we want to.

src/core/server/logging/logging_system.ts Outdated Show resolved Hide resolved
@pgayvallet
Copy link
Contributor Author

Copy link
Contributor

@mshustov mshustov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change size-limit policy defaults in a follow up

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Distributable file count

id before after diff
default 46991 47768 +777
oss 27598 27615 +17

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pgayvallet pgayvallet merged commit 8cdcae2 into elastic:master Dec 10, 2020
kibana-core [DEPRECATED] automation moved this from Pending Review to Done (7.11) Dec 10, 2020
pgayvallet added a commit to pgayvallet/kibana that referenced this pull request Dec 10, 2020
* You need to start somewhere

* revert comment

* rename default strategy to numeric

* add some tests

* fix some tests

* update documentation

* update generated doc

* change applyBaseConfig to be async

* fix integ tests

* add integration tests

* some renames

* more tests

* more tests

* nits on README

* some self review

* doc nits

* self review

* use `escapeRegExp` from lodash

* address some review comments

* a few more nits

* extract `isDevCliParent` check outside of LoggingSystem.upgrade

* log errors from context

* add defaults for policy/strategy
pgayvallet added a commit that referenced this pull request Dec 10, 2020
* You need to start somewhere

* revert comment

* rename default strategy to numeric

* add some tests

* fix some tests

* update documentation

* update generated doc

* change applyBaseConfig to be async

* fix integ tests

* add integration tests

* some renames

* more tests

* more tests

* nits on README

* some self review

* doc nits

* self review

* use `escapeRegExp` from lodash

* address some review comments

* a few more nits

* extract `isDevCliParent` check outside of LoggingSystem.upgrade

* log errors from context

* add defaults for policy/strategy
gmmorris added a commit to gmmorris/kibana that referenced this pull request Dec 10, 2020
* master: (53 commits)
  Fixing recovered instance reference bug (elastic#85412)
  Switch to new elasticsearch client for Visualizations (elastic#85245)
  Switch to new elasticsearch client for TSVB (elastic#85275)
  Switch to new elasticsearch client for Vega (elastic#85280)
  [ILM] Add shrink field to hot phase (elastic#84087)
  Add rolling-file appender to core logging (elastic#84735)
  [APM] Service overview: Dependencies table (elastic#83416)
  [Uptime ]Update empty message for certs list (elastic#78575)
  [Graph] Fix graph saved object references (elastic#85295)
  [APM] Create new API's to return Latency and Throughput charts (elastic#85242)
  [Advanced settings] Reset to default for empty strings (elastic#85137)
  [SECURITY SOLUTION] Bundles _source -> Fields + able to sort on multiple fields in Timeline (elastic#83761)
  [Fleet] Update agent listing for better status reporting (elastic#84798)
  [APM] enable 'sanitize_field_names' for Go (elastic#85373)
  Update dependency @elastic/charts to v24.4.0 (elastic#85452)
  Introduce external url service (elastic#81234)
  Deprecate disabling the security plugin (elastic#85159)
  [FLEET] New Integration Policy Details page for use in Integrations section (elastic#85355)
  [Security Solutions][Detection Engine] Fixes one liner access control with find_rules REST API
  chore: 🤖 remove extraPublicDirs (elastic#85454)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.11.0 v8.0.0
Projects
Development

Successfully merging this pull request may close these issues.

[KP] Implement Log rotation appender
7 participants