Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On WAN replication overrun, server logging is excessive #7242

Closed
StanAccy opened this issue Jan 4, 2016 · 13 comments

Comments

Projects
None yet
7 participants
@StanAccy
Copy link

commented Jan 4, 2016

3.6-122315-Enterprise:

When a WAN Q overrun occurs (with exception enabled), the source cluster logs an excessive amount of noise. This seems to be logged, 1 stacktrace per IMap partition:

2016-01-04 18:02:13,880 ERROR [hz._hzInstance_1_WAN1.partition-operation.thread-5] com.hazelcast.client.impl.protocol.task.map.MapPutAllMessageTask - [127.0.0.1]:5701 [WAN1] [3.6-122315] While executing request: com.hazelcast.client.impl.protocol.codec.MapPutAllCodec$RequestParameters@64e82635 -> null
com.hazelcast.enterprise.wan.WANReplicationQueueFullException
    at com.hazelcast.enterprise.wan.replication.AbstractWanReplication.checkWanReplicationQueues(AbstractWanReplication.java:332)
    at com.hazelcast.enterprise.wan.WanReplicationPublisherDelegate.checkWanReplicationQueues(WanReplicationPublisherDelegate.java:79)
    at com.hazelcast.map.impl.MapContainer.checkWanReplicationQueues(MapContainer.java:223)
    at com.hazelcast.map.impl.operation.WANAwareOperationProvider.checkWanReplicationQueues(WANAwareOperationProvider.java:173)
    at com.hazelcast.map.impl.operation.WANAwareOperationProvider.createPutAllOperation(WANAwareOperationProvider.java:129)
    at com.hazelcast.client.impl.protocol.task.map.MapPutAllMessageTask.prepareOperation(MapPutAllMessageTask.java:49)
    at com.hazelcast.client.impl.protocol.task.AbstractPartitionMessageTask.processMessage(AbstractPartitionMessageTask.java:58)
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.initializeAndProcessMessage(AbstractMessageTask.java:121)
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.run(AbstractMessageTask.java:100)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:127)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPartitionSpecificRunnable(OperationThread.java:159)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:142)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:124)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:99)
@pveentjer

This comment has been minimized.

Copy link
Member

commented Jan 5, 2016

There is no reason to log this exception. Just logging actual message would be sufficient. But even in that case you still can get a huge amount of logging. So probably best to log it at most every x seconds and include the total number of failures?

Wan replication queue full: occurrences 105853

This way there will be total control on the number of log messages.

@emrahkocaman

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2016

I don't think that we can easily get rid of these logs.
in "putAll" operation,client divides the input smaller batches depending on their key and sends a "putAll" operation for each partition. (see : https://github.com/hazelcast/hazelcast/blob/master/hazelcast-client/src/main/java/com/hazelcast/client/proxy/ClientMapProxy.java#L1376)
This is why you're seeing lots of traces.

And WAN side just throws the exception, no logging is done. Logging is done by client engine.
(https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/java/com/hazelcast/client/impl/ClientEngineImpl.java#L446)

@emrahkocaman emrahkocaman added this to the 3.7 milestone Jan 5, 2016

@gregrluck

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2016

There is no reason to log this exception at all. That we throwing an exception is sufficient.

@emrahkocaman

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2016

We're only throwing, there is no logging on WAN side, but as seen here
, client engine is logging every exception.

I think all we can do for now is, adding an instance of check to client engine impl to see if the incoming throwable is WAN exception and ignore it.

Dear Clients team, if you've any objections to this approach or better alternatives in mind, I'd like to hear them.

@gregrluck

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2016

Saying ignore is ambiguous. What we want is:

  1. Not to log this exception on the client side.
  2. We do want the exception thrown to the calling user code. i.e. we do not want to swallow this exception. It is needed to implement back pressure in client code.
@emrahkocaman

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2016

I meant ignoring logging.
Here is the line where logging is done;
https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/java/com/hazelcast/client/impl/ClientEngineImpl.java#L446

What I proposed was, adding an instance of check to this line and skip logging part if exception is instance of WANReplicationQueueFullException

  1. Exceptions are logged in server side, I think there is no problem in client side. My proposed changes are in server side also.
  2. Exception won't be swallowed. This is just logging part.
@pveentjer

This comment has been minimized.

Copy link
Member

commented Jan 6, 2016

An exception should be thrown or logged; not both. If the client engine is throwing and logging an exception, it is doing too much. So probably we should get rid of the exception logging in the client engine.

@sancar

This comment has been minimized.

Copy link
Member

commented Jan 6, 2016

I don't want to lose it completely because sometimes it helps a lot. How about printing in FINEST log level ?

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jan 6, 2016

I think having it at finest is ok since it is meant for debugging anyway. And since by default the system isn't using finest, it behaves like propagation only. So then it isn't doing too much.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jan 6, 2016

Do we want to wait for 3.7 to get this fixed? Imho it is a small change in the ClientEngine and without this fix, there could be a serious performance degradation.

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2016

3.6.1 ?

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jan 6, 2016

why not in 3.6? Imho it is a log level change. @sancar WDYT

@sancar

This comment has been minimized.

Copy link
Member

commented Jan 6, 2016

I am ok with 3.6 with only log level change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.