-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/suppress clientpool stacktrace spam #2373
Conversation
These stack traces spam the logs (and were logged twice per attempt!). Additionally, they provide minimal information.
This is an attempt to restore the pertinent information lost by the stack trace. If a deployment is seeing many timeouts that are successfully retried, they may be able to determine the culprit by examining the query strings, and concurrently checking the product's slow-logs (timed out queries will very likely be logged there).
The toString method is called in CassandraClientPool, if the function is retried.
Exception messages are considered unsafe. The query string, sadly, isn't safe yet - it mentions a column family, and we can't guarantee safety of the column family name.
Note that a refactor in this PR probably clashes with #2363. |
@@ -329,15 +332,15 @@ private void checkAndUpdateBlacklist() { | |||
if (isHostHealthy(host)) { | |||
blacklistedHosts.remove(host); | |||
log.error("Added host {} back into the pool after a waiting period and successful health check.", | |||
host); | |||
SafeArg.of("host", host)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change this to info?
} | ||
} | ||
} | ||
} | ||
|
||
private void addToBlacklist(InetSocketAddress badHost) { | ||
blacklistedHosts.put(badHost, System.currentTimeMillis()); | ||
log.info("Blacklisted host '{}'", badHost); | ||
log.info("Blacklisted host '{}'", SafeArg.of("badHost", badHost)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change this to warn?
@@ -348,7 +351,7 @@ private boolean isHostHealthy(InetSocketAddress host) { | |||
return true; | |||
} catch (Exception e) { | |||
log.error("We tried to add {} back into the pool, but got an exception" | |||
+ " that caused us to distrust this host further.", host, e); | |||
+ " that caused us to distrust this host further.", SafeArg.of("host", host), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warn maybe? and not sure if we want the exception. maybe just the message?
@@ -582,7 +586,12 @@ private void refreshTokenRanges() { | |||
triedHosts.add(hostPool.getHost()); | |||
this.<K>handleException(numTries, hostPool.getHost(), e); | |||
if (isRetriableWithBackoffException(e)) { | |||
log.warn("Retrying with backoff a query intended for host {}.", hostPool.getHost(), e); | |||
log.warn("Retrying with backoff a query, {}, intended for host {}. Exception message was: {} : {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log the sleep duration maybe? and maybe remove the exception?
throw (K) ex; | ||
} | ||
} else { | ||
log.warn("Error occurred talking to cassandra. Attempt {} of {}.", numTries, MAX_TRIES_TOTAL, ex); | ||
log.warn("Error occurred talking to cassandra. Attempt {} of {}. Exception message was: {} : {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log the exception the first time only
Codecov Report
@@ Coverage Diff @@
## develop #2373 +/- ##
=============================================
- Coverage 59.84% 59.83% -0.01%
- Complexity 3328 4617 +1289
=============================================
Files 823 823
Lines 38558 38580 +22
Branches 4004 4005 +1
=============================================
+ Hits 23075 23085 +10
- Misses 14035 14045 +10
- Partials 1448 1450 +2
Continue to review full report at Codecov.
|
int sleepDuration = numTries * 1000 + (ThreadLocalRandom.current().nextInt(1000) - 500); | ||
log.warn("Retrying with backoff ({}ms} a query, {}, intended for host {}.", | ||
SafeArg.of("sleepDuration", sleepDuration), | ||
fn.toString(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should we wrapped in an Arg
…tpool-stacktrace-spam
throw (K) ex; | ||
} | ||
} else { | ||
log.warn("Error occurred talking to cassandra. Attempt {} of {}.", numTries, MAX_TRIES_TOTAL, ex); | ||
// Only log the actual exception the first time | ||
if (numTries > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numTries == 1
as per the comment.
* Copy-paste those changes * Fix logic: numTries == 1 instead of numTries > 1 * Add SafeArg and remove exception from the fast failover message
Superseded by #2432. |
* Reapply changes from #2373 * Copy-paste those changes * Fix logic: numTries == 1 instead of numTries > 1 * Add SafeArg and remove exception from the fast failover message * newHost -> randomHost * Fix miscellaneous nits * Fix or suppress warnings
Goals (and why): Fixes #2280.
Implementation Description (bullets):
SafeArg
calls for arguments that are considered safe.Concerns (what feedback would you like?):
Where should we start reviewing?: CassandraClientPool
Priority (whenever / two weeks / yesterday): whenever
Logs before:
Logs after:
This change is