Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[logging/metrics] Provide ability to log ops data from core metrics service #86162

Closed
lukeelmers opened this issue Dec 16, 2020 · 4 comments · Fixed by #88070
Closed

[logging/metrics] Provide ability to log ops data from core metrics service #86162

lukeelmers opened this issue Dec 16, 2020 · 4 comments · Fixed by #88070
Assignees
Labels
Feature:Logging Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@lukeelmers
Copy link
Member

lukeelmers commented Dec 16, 2020

Related to #13241

In the legacy platform, we rely on the @hapi/good ops event for automatically logging ops metrics at a particular interval. The opsInterval is configurable and provided to the hapi/good configuration via the legacy logging server.

As part of our effort to remove our dependency on @hapi/good, we should provide some equivalent functionality in the Kibana platform logger via the metrics service.

Currently good includes a handful of metrics in the ops event payload**, however the legacy logger only pulls the following items from the ops event payload:

Ops formatted log property Location in @hapi/good ops payload Location in metrics service
type event ('ops')
@timestamp timestamp
tags -
pid pid process.pid
memory proc.mem.heapUsed process.memory.heap.used_in_bytes
uptime proc.uptime process.uptime_in_millis
load os.load os.load
delay proc.delay process.event_loop_delay

** More detail on the items good includes in the payload can be found in these old docs from even-better

Tasks:

  • Update the core metrics service to log these metrics at the ops.interval from config
    • Metrics should already be collecting all of these items in different locations, and is already using ops.interval to specify the collection interval
  • Integration tests
  • Documentation on where to find these logs moving forward

Open questions / things to investigate:

  • Which logging context to use?
    • 'metrics' probably makes the most sense unless there's a reason we want it to be separate from the logs related to the service itself
  • Is there anything we can remove/simplify in @kbn/legacy-logging that still preserves BWC?
    • e.g. Can we remove some of the special handling we are doing in the log formatting so that we are reformatting the existing data from the metrics service instead of what comes from the hapi/good ops events? This would be cleaner.
@lukeelmers lukeelmers added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Logging labels Dec 16, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@TinaHeiligers
Copy link
Contributor

@lukeelmers I've gone with intercepting refreshMetrics(), where we collect the metrics at the configured ops.interval, using a similar implementation to the legacy logging implementation to extract the data we're interested in and format it in the same manner:

Logging metrics from within refreshMetrics()
// src/core/server/metrics/metrics_service.ts

  private async refreshMetrics() {
    this.logger.debug('Refreshing metrics');
    const metrics = await this.metricsCollector!.collect(); // capture the metrics
    const opsLogsMetrics = this.extractOpsLogsData(metrics); // extract and format the metrics for logging
    this.logger.info(opsLogsMetrics); // log the metrics out

    this.metricsCollector!.reset();
    this.metrics$.next(metrics);
  }
extracting and formatting metrics for logging:
// src/core/server/mertics/metrics_service.ts

  private extractOpsLogsData({ process, os }: Partial<OpsMetrics>): string {
    const memoryLogEntryInMB = numeral(process?.memory?.heap?.used_in_bytes ?? 0).format('0.0b');
    // ProcessMetricsCollector converts from seconds to milliseconds. Format here is HH:mm:ss for backward compatibility
    const uptimeLogEntry = numeral((process?.uptime_in_millis ?? 0) / 1000).format('00:00:00');
    const loadLogEntry = [...Object.values(os?.load ?? [])]
      .map((val: number) => {
        return numeral(val).format('0.00');
      })
      .join(' ');
    const delayLogEntry = numeral(process?.event_loop_delay ?? 0).format('0.000');

    return `memory: ${memoryLogEntryInMB} uptime: ${uptimeLogEntry} load: [${loadLogEntry}] delay: ${delayLogEntry}`;
  }

Example of the output:

With logging appenders:

// kibana.dev.yml
logging:
  events:
    log: ['warning', 'error', 'fatal', 'info', 'reporting']
    json: false
    verbose: true
    # added for logging work
  appenders:
    console:
      kind: console
      layout:
        kind: pattern
        highlight: true

  root:
    appenders: [default]
    level: warn

  loggers:
    - context: metrics
      appenders: [console]
      level: info

// outputs:
[2021-01-11T18:03:01.138Z][INFO ][http.server.Kibana] http server running at http://localhost:5601/chr
[2021-01-11T18:03:04.088Z][INFO ][metrics] memory: 331.7MB uptime: 0:00:13 load: [2.86 4.13 3.38] delay: 0.288
[2021-01-11T18:03:04.428Z][INFO ][metrics] memory: 246.9MB uptime: 0:00:18 load: [2.86 4.13 3.38] delay: 0.223

Without appenders, the logs are forwarded to the legacy logging system and we will get duplicate data:

//outputs:
server    ops   [10:57:45.324]  memory: 356.6MB uptime: 0:00:21 load: [8.12 5.40 3.30] delay: 3.034				—> legacy logging
server    log   [10:57:45.802] [info][metrics] memory: 357.3MB uptime: 0:00:21 load: [8.12 5.40 3.30] delay: 0.155		—> NP logging forwarded to legacy  logging
[2021-01-11T17:57:47.954Z][INFO ][metrics] memory: 121.1MB uptime: 0:00:29 load: [8.12 5.40 3.30] delay: 0.226	—> NP logging from metrics service

I'd like to make sure this is the correct approach and there aren't any obvious issues with it, before diving into integration tests (for which I'll need to call on you for some more guidance 😉 ).

@pgayvallet
Copy link
Contributor

Without appenders, the logs are forwarded to the legacy logging system and we will get duplicate data:

That's the same issue @lukeelmers encountered with with request/response logging in #13241. I guess the solution will be to log them under the debug level in core to avoid this duplication?

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Jan 14, 2021

@pgayvallet In my draft for this work, I've gone with logging the data under the info level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Logging Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants