Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add useful logging #33

Conversation

dai-chen
Copy link
Member

@dai-chen dai-chen commented Apr 17, 2019

Issue #, if available: #19

Description of changes: The following changes are being introduced in this PR:

  1. Generate unique ID for each request for tracking
  2. Prepend request ID to the logging along the path
  3. Extra log for slow query with configurable threshold (add generic code for using ES cluster settings)
  4. Return only general message in details in error message for security concern.

All unit test and legacy/new integration tests passed. Checked added log in elasticsearch.log and did some manually testing in Kibana, primarily for slow log threshold:

POST _opendistro/_sql
{
  "query": "SELECT * FROM accounts WHERE age = 30"
}

PUT /_cluster/settings
{
  "transient": {
    "opendistro.sql.query.slowlog": "0"
  }
}

PUT /_cluster/settings
{
  "transient": {
    "opendistro.sql.query.slowlog": null
  }
}

PUT /_cluster/settings
{
  "persistent": {
    "opendistro.sql.query.slowlog": "0"
  }
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dai-chen dai-chen added the enhancement New feature or request label Apr 17, 2019
@dai-chen dai-chen self-assigned this Apr 17, 2019
@dai-chen dai-chen changed the title Add useful logging quickfix Add useful logging Apr 17, 2019
Map<String, String> params,
QueryAction action,
RestChannel channel) throws Exception {
Instant startTime = Instant.now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instant.now() can have pretty big errors depending on time drift/skew/last time of sync of system time/etc. Please use System.nanoTime() for anything where you need to measure a period and not get a specific point in time (https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime--).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instant.now() can have pretty big errors depending on time drift/skew/last time of sync of system time/etc. Please use System.nanoTime() for anything where you need to measure a period and not get a specific point in time (https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime--).

Thanks for the suggestion! I wasn't aware of this. Could you pass me some reference for the problem you mentioned? Because we only care about the delta value returned from Duration.between, is it possible to be wrong too in some case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one touches upon the problem a little bit: https://www.baeldung.com/java-measure-elapsed-time. Most of the sources say that resolution of nanoTime is better, but that's not the issue. The real difference is that System.currentTimeMillis() and Instant.now() (which relies on the former) both use wall-clock time from the JVM, which in its turn gets it from OS. Depending on what protocol the machine is using to synchronize the time, it can lag behind or go far ahead and then during sync get a huge change (either negative or positive). The System.nanoTime() is based on software-based counter (not related to wall clock), which only increases, and does not need to be synced with any other machine/server/clock, so the diffs of values from nanoTime will always be pretty accurate (I guess as long as you are using it on the same thread).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one touches upon the problem a little bit: https://www.baeldung.com/java-measure-elapsed-time. Most of the sources say that resolution of nanoTime is better, but that's not the issue. The real difference is that System.currentTimeMillis() and Instant.now() (which relies on the former) both use wall-clock time from the JVM, which in its turn gets it from OS. Depending on what protocol the machine is using to synchronize the time, it can lag behind or go far ahead and then during sync get a huge change (either negative or positive). The System.nanoTime() is based on software-based counter (not related to wall clock), which only increases, and does not need to be synced with any other machine/server/clock, so the diffs of values from nanoTime will always be pretty accurate (I guess as long as you are using it on the same thread).

Thanks! Good to know the subtle difference. I think Instant and Duration is sufficient in our case. Because we are not benchmarking, we just need a rough elapsed time in second(s). nanoTime() may be more accurate in some edge case, though it's hassle to convert it to sec or millisec for logging or comparison. Hopefully this makes sense to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just two lines that need to be changed:
#140 should be final long startNanos = System.nanoTime();
#145 should be `final Duration elapsed = Duration.ofNanos(System.nanoTime() - startNanos);

It's up to you if you want to keep it this way, not a blocker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just two lines that need to be changed:
#140 should be final long startNanos = System.nanoTime();
#145 should be `final Duration elapsed = Duration.ofNanos(System.nanoTime() - startNanos);

It's up to you if you want to keep it this way, not a blocker.

Easier than I thought. Testing it. Will push very soon. Thanks!

Copy link
Contributor

@arsen-es arsen-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create the issue for updating the request id logging approach and include in the response to the relevant comment before merging and closing this PR.

@dai-chen
Copy link
Member Author

Please create the issue for updating the request id logging approach and include in the response to the relevant comment before merging and closing this PR.

Done!

@dai-chen dai-chen merged commit 93ffbb7 into opendistro-for-elasticsearch:master Apr 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants