Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Pluggable Connection Time Trace Logger #1077

Merged
merged 7 commits into from
Feb 28, 2024
Merged

User Pluggable Connection Time Trace Logger #1077

merged 7 commits into from
Feb 28, 2024

Conversation

scottf
Copy link
Contributor

@scottf scottf commented Feb 11, 2024

Allow users to provide a replacement for the trace logging feature. Currently it's just simple check a flag / write to the console.
This change provides a fully backward compatible implementation that can be overridden per connection.

Also Fixed

Also fixed timeCheck reporting

// if the dev provided an impl, we assume they meant to time trace the connection
traceConnection = true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old impl just had the traceConnection flag. Since we are switching to an interface/impl, this checks the flag and makes the appropriate impl. If the flag is false, the impl is a no-op, which is actually less operations than the old implementation that would call a method, pass a the traceConnection flag and check the flag even when trace connection is off. The no-op is just a method call as a comparison.

@@ -1774,6 +1811,7 @@ private Options(Builder b) {
this.reconnectDelayHandler = b.reconnectDelayHandler;

this.errorListener = b.errorListener == null ? new ErrorListenerLoggerImpl() : b.errorListener;
this.timeTraceLogger = b.timeTraceLogger;
Copy link
Contributor Author

@scottf scottf Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the builder prepares it and now also prepares the error listener for consistency

}
this.unprocessedServers = b.unprocessedServers; // exactly how the user gave them
this.natsServerUris = Collections.unmodifiableList(b.natsServerUris);
this.unprocessedServers = Collections.unmodifiableList(b.unprocessedServers); // exactly how the user gave them
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.After reviewing the builder, I noticed that natsServerUris was already never empty so that was unnecessary.
2. All lists, since they can be viewed from public api needs to be unmodifiable.

@stefanLeo
Copy link

stefanLeo commented Feb 12, 2024

Yes. Fully understood. So the use cases are from my PoV:

  1. Have an easy integration to grab whatever you log deep inside your code and ingest into my custom logging framework incl. the timestamps etc.
    I fully agree that excessive logging ist not a good idea here, but adding logs at some critical points can help a lot with troubleshooting especially for new users who do not yet know what is important for nats to add to as listeners etc...
  2. If the lib provides its set of logs it is much easier for you as a maintainer to debug/analyse issues on a customer site or if anyone opens a bug...
    After all it is then your logs and not just some provided by some custom injected logging that may be compleltely wrong and/or it takes you ages to understand those logs.
  3. So far for logging in general. Specifically: With DNS we had issues with the DNS server during failover tests in Openshift which led to reconnection timeouts of 30+ secs. It took us one week to figure out that the dns part was problematic using a profiling techniques. I.e., imho whenever something you request from an external system you should log it and the returned result in case it is something that does not happen all the time e.g.. connection interruptions or abortions, dns resolving, config parsing issues,... things like that.
    Simply to make everyones life easier when having to troubleshoot issues or integrate nats.

@scottf scottf changed the title User Pluggable Connection Time Trace Logger DRAFT: User Pluggable Connection Time Trace Logger Feb 12, 2024
@scottf
Copy link
Contributor Author

scottf commented Feb 12, 2024

Ok let's think about places where we can safely add logging into. Anything around Connection/Reconnects. Anything around JetStreamManagement, but that's high level, so not sure how that is useful. I know sometimes the errors are IOException and it's kinda useless. Where else?

@scottf scottf marked this pull request as draft February 12, 2024 21:40
@stefanLeo
Copy link

stefanLeo commented Feb 13, 2024

Examples from my Pov:

  • DNS resolving
  • Reconnections
  • authentication issues/TLS issues
  • Connection abortions due to whatever (timeouts, ping outages, ...)
  • Unparsabel messages received like the IllegalStateExceptions / NumberFormatExceptions in NatsConnectionReader > you get them via the handlers I suppose anyhow.. but imho logging the error before throwing an exception still makes sense.
  • Some others that may be valuable:
    -- If the file used in fileauthhandler is null you assume this is a challengae only auth > log this assumption.
    -- jwt token parsing issues in fileauthhandler (right now it is an Exception inside an IllegalStateException inside an IOException
    -- I am not a nats java expert (yet :-)) so there may be more in that direction to be added peu à peu

@scottf scottf marked this pull request as ready for review February 20, 2024 21:39
@scottf scottf changed the title DRAFT: User Pluggable Connection Time Trace Logger User Pluggable Connection Time Trace Logger Feb 20, 2024
}
else {
double seconds = ((double) remaining) / 1_000_000_000.0;
timeTraceLogger.trace(message + String.format(", %.3f (s) remaining", seconds));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is the change to reporting the time check

assertTrue(traces.get(i++).startsWith("starting ping and cleanup timers"));
assertTrue(traces.get(i++).startsWith("updating status to connected"));
assertTrue(traces.get(i++).startsWith("status updated"));
assertTrue(traces.get(i).startsWith("connect complete"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuring backward compatibility on messages in case someone was relying on them in logs.

nc.traceTimeCheck("lt 0, lt -1_000_000_000", -1_100_000_000);
assertEquals("lt 0, lt -1_000_000_000, 1.100 (s) beyond timeout", l.lastTrace);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the new implementation of time check reporting.

Copy link
Collaborator

@jarretlavallee jarretlavallee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@scottf scottf merged commit 39d651b into main Feb 28, 2024
2 checks passed
@scottf scottf deleted the TimeTraceLogger branch February 28, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants