New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal executor termination #113

Closed
tpolekhin opened this Issue Sep 1, 2015 · 8 comments

Comments

Projects
None yet
3 participants
@tpolekhin

tpolekhin commented Sep 1, 2015

Have a 6 slave mesos cluster 0.22.1
set the reservation for kafka:
cpus(kafka):1;mem(kafka):2048;disk(kafka):1500;ports(kafka):[1025-1025];

builded the jar from sources yesterday with roles fixes

started the scheduler and created brokers:
dcos kafka add 0..5 --cpus 0.9 --mem 2048 --heap 1500 --api http://...

Started all 6 brokers and started writing to it with logstash

After ~ 1.5 hours one of brokers restarted accidentally. Here are all the logs a could find:
schedluer.log

2015-09-01 09:54:22,039 [Thread-2260] INFO  ly.stealth.mesos.kafka.Scheduler$  - [statusUpdate] broker-1-b1682e7a-a9b1-4ba7-be2b-653dd01ae1a1 TASK_LOST slave:#92-S0 reason:REASON_EXECUTOR_TERMINATED message:Abnormal executor termination
2015-09-01 09:54:22,040 [Thread-2260] INFO  ly.stealth.mesos.kafka.Scheduler$  - Broker 1 failed 1, waiting 1m, next start ~ 2015-09-01 09:55:22Z

executor: stderr

I0901 08:22:10.016270 10077 logging.cpp:172] INFO level logging started!
I0901 08:22:10.017199 10077 exec.cpp:132] Version: 0.22.1
I0901 08:22:10.021268 10095 exec.cpp:206] Executor registered on slave 20150901-080317-100706496-5050-29492-S0
tput: No value for $TERM and no -T specified
@tpolekhin

This comment has been minimized.

Show comment
Hide comment
@tpolekhin

tpolekhin Sep 1, 2015

another one broker just failed with the same error
This is what i found in the controller.log

[2015-09-01 09:54:30,009] INFO [BrokerChangeListener on Controller 3]: Broker change listener fired for path /brokers/ids with children 3,2,0,5,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:54:30,027] INFO [BrokerChangeListener on Controller 3]: Newly added brokers: , deleted brokers: 1, all live brokers: 0,5,2,3,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:54:30,029] INFO [Controller-3-to-broker-1-send-thread], Shutting down (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,029] INFO [Controller-3-to-broker-1-send-thread], Stopped  (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,029] INFO [Controller-3-to-broker-1-send-thread], Shutdown completed (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,030] INFO [Controller 3]: Broker failure callback for 1 (kafka.controller.KafkaController)
[2015-09-01 09:54:30,030] INFO [Controller 3]: Removed ArrayBuffer() from list of shutting down brokers. (kafka.controller.KafkaController)
[2015-09-01 09:54:30,030] INFO [Partition state machine on Controller 3]: Invoking state change to OfflinePartition for partitions  (kafka.controller.PartitionStateMachine)
[2015-09-01 09:54:30,031] INFO [Replica state machine on controller 3]: Invoking state change to OfflineReplica for replicas [Topic=logstash,Partition=0,Replica=1] (kafka.controller.ReplicaStateMachine)
[2015-09-01 09:54:30,034] DEBUG [Controller 3]: Removing replica 1 from ISR 5,1 for partition [logstash,0]. (kafka.controller.KafkaController)
[2015-09-01 09:54:30,044] INFO [Controller 3]: New leader and ISR for partition [logstash,0] is {"leader":5,"leader_epoch":1,"isr":[5]} (kafka.controller.KafkaController)
[2015-09-01 09:54:30,048] WARN [Controller-3-to-broker-2-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,049] DEBUG The stop replica request (delete = true) sent to broker 1 is  (kafka.controller.ControllerBrokerRequestBatch)
[2015-09-01 09:54:30,049] WARN [Controller-3-to-broker-3-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,051] DEBUG The stop replica request (delete = false) sent to broker 1 is [Topic=logstash,Partition=0,Replica=1] (kafka.controller.ControllerBrokerRequestBatch)
[2015-09-01 09:54:30,051] INFO [Controller-3-to-broker-3-send-thread], Controller 3 connected to id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,049] WARN [Controller-3-to-broker-4-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,048] WARN [Controller-3-to-broker-0-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,048] WARN [Controller-3-to-broker-5-send-thread], Controller 3 epoch 1 fails to send request Name:LeaderAndIsrRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;Leaders:id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025;PartitionState:(logstash,0) -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,051] INFO [Controller-3-to-broker-2-send-thread], Controller 3 connected to id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,053] INFO [Controller-3-to-broker-4-send-thread], Controller 3 connected to id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,053] INFO [Controller-3-to-broker-0-send-thread], Controller 3 connected to id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,054] INFO [Controller-3-to-broker-5-send-thread], Controller 3 connected to id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,054] WARN [Channel manager on controller 3]: Not sending request Name: StopReplicaRequest; Version: 0; CorrelationId: 19; ClientId: ; DeletePartitions: false; ControllerId: 3; ControllerEpoch: 1; Partitions: [logstash,0] to broker 1, since it is offline. (kafka.controller.ControllerChannelManager)
[2015-09-01 09:55:25,316] INFO [BrokerChangeListener on Controller 3]: Broker change listener fired for path /brokers/ids with children 3,2,1,0,5,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:55:25,338] INFO [BrokerChangeListener on Controller 3]: Newly added brokers: 1, deleted brokers: , all live brokers: 0,5,1,2,3,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:55:25,338] DEBUG [Channel manager on controller 3]: Controller 3 trying to connect to broker 1 (kafka.controller.ControllerChannelManager)
[2015-09-01 09:55:25,340] INFO [Controller-3-to-broker-1-send-thread], Controller 3 connected to id:1,host:shipped-analytics-us-texas-3-dcos-slave-01.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:55:25,340] INFO [Controller 3]: New broker startup callback for 1 (kafka.controller.KafkaController)
[2015-09-01 09:55:25,340] INFO [Controller-3-to-broker-1-send-thread], Starting  (kafka.controller.RequestSendThread)
[2015-09-01 09:55:25,340] INFO [Replica state machine on controller 3]: Invoking state change to OnlineReplica for replicas [Topic=logstash,Partition=0,Replica=1] (kafka.controller.ReplicaStateMachine)
[2015-09-01 09:57:15,567] TRACE [Controller 3]: checking need to trigger partition rebalance (kafka.controller.KafkaController)
[2015-09-01 09:57:15,567] DEBUG [Controller 3]: preferred replicas by broker Map(5 -> Map([logstash,0] -> List(5, 1))) (kafka.controller.KafkaController)
[2015-09-01 09:57:15,567] DEBUG [Controller 3]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2015-09-01 09:57:15,568] TRACE [Controller 3]: leader imbalance ratio for broker 5 is 0.000000 (kafka.controller.KafkaController)

tpolekhin commented Sep 1, 2015

another one broker just failed with the same error
This is what i found in the controller.log

[2015-09-01 09:54:30,009] INFO [BrokerChangeListener on Controller 3]: Broker change listener fired for path /brokers/ids with children 3,2,0,5,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:54:30,027] INFO [BrokerChangeListener on Controller 3]: Newly added brokers: , deleted brokers: 1, all live brokers: 0,5,2,3,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:54:30,029] INFO [Controller-3-to-broker-1-send-thread], Shutting down (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,029] INFO [Controller-3-to-broker-1-send-thread], Stopped  (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,029] INFO [Controller-3-to-broker-1-send-thread], Shutdown completed (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,030] INFO [Controller 3]: Broker failure callback for 1 (kafka.controller.KafkaController)
[2015-09-01 09:54:30,030] INFO [Controller 3]: Removed ArrayBuffer() from list of shutting down brokers. (kafka.controller.KafkaController)
[2015-09-01 09:54:30,030] INFO [Partition state machine on Controller 3]: Invoking state change to OfflinePartition for partitions  (kafka.controller.PartitionStateMachine)
[2015-09-01 09:54:30,031] INFO [Replica state machine on controller 3]: Invoking state change to OfflineReplica for replicas [Topic=logstash,Partition=0,Replica=1] (kafka.controller.ReplicaStateMachine)
[2015-09-01 09:54:30,034] DEBUG [Controller 3]: Removing replica 1 from ISR 5,1 for partition [logstash,0]. (kafka.controller.KafkaController)
[2015-09-01 09:54:30,044] INFO [Controller 3]: New leader and ISR for partition [logstash,0] is {"leader":5,"leader_epoch":1,"isr":[5]} (kafka.controller.KafkaController)
[2015-09-01 09:54:30,048] WARN [Controller-3-to-broker-2-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,049] DEBUG The stop replica request (delete = true) sent to broker 1 is  (kafka.controller.ControllerBrokerRequestBatch)
[2015-09-01 09:54:30,049] WARN [Controller-3-to-broker-3-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,051] DEBUG The stop replica request (delete = false) sent to broker 1 is [Topic=logstash,Partition=0,Replica=1] (kafka.controller.ControllerBrokerRequestBatch)
[2015-09-01 09:54:30,051] INFO [Controller-3-to-broker-3-send-thread], Controller 3 connected to id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,049] WARN [Controller-3-to-broker-4-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,048] WARN [Controller-3-to-broker-0-send-thread], Controller 3 epoch 1 fails to send request Name:UpdateMetadataRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;AliveBrokers:id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025,id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025,id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025,id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025,id:3,host:shipped-analytics-us-texas-3-dcos-slave-04.novalocal,port:1025;PartitionState:[logstash,0] -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,048] WARN [Controller-3-to-broker-5-send-thread], Controller 3 epoch 1 fails to send request Name:LeaderAndIsrRequest;Version:0;Controller:3;ControllerEpoch:1;CorrelationId:19;ClientId:id_3-host_shipped-analytics-us-texas-3-dcos-slave-04.novalocal-port_1025;Leaders:id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025;PartitionState:(logstash,0) -> (LeaderAndIsrInfo:(Leader:5,ISR:5,LeaderEpoch:1,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:5,1) to broker id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025. Reconnecting to broker. (kafka.controller.RequestSendThread)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
    at kafka.utils.Utils$.read(Utils.scala:381)
    at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
    at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
    at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
    at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:133)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-09-01 09:54:30,051] INFO [Controller-3-to-broker-2-send-thread], Controller 3 connected to id:2,host:shipped-analytics-us-texas-3-dcos-slave-02.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,053] INFO [Controller-3-to-broker-4-send-thread], Controller 3 connected to id:4,host:shipped-analytics-us-texas-3-dcos-slave-03.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,053] INFO [Controller-3-to-broker-0-send-thread], Controller 3 connected to id:0,host:shipped-analytics-us-texas-3-dcos-slave-05.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,054] INFO [Controller-3-to-broker-5-send-thread], Controller 3 connected to id:5,host:shipped-analytics-us-texas-3-dcos-slave-06.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:54:30,054] WARN [Channel manager on controller 3]: Not sending request Name: StopReplicaRequest; Version: 0; CorrelationId: 19; ClientId: ; DeletePartitions: false; ControllerId: 3; ControllerEpoch: 1; Partitions: [logstash,0] to broker 1, since it is offline. (kafka.controller.ControllerChannelManager)
[2015-09-01 09:55:25,316] INFO [BrokerChangeListener on Controller 3]: Broker change listener fired for path /brokers/ids with children 3,2,1,0,5,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:55:25,338] INFO [BrokerChangeListener on Controller 3]: Newly added brokers: 1, deleted brokers: , all live brokers: 0,5,1,2,3,4 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2015-09-01 09:55:25,338] DEBUG [Channel manager on controller 3]: Controller 3 trying to connect to broker 1 (kafka.controller.ControllerChannelManager)
[2015-09-01 09:55:25,340] INFO [Controller-3-to-broker-1-send-thread], Controller 3 connected to id:1,host:shipped-analytics-us-texas-3-dcos-slave-01.novalocal,port:1025 for sending state change requests (kafka.controller.RequestSendThread)
[2015-09-01 09:55:25,340] INFO [Controller 3]: New broker startup callback for 1 (kafka.controller.KafkaController)
[2015-09-01 09:55:25,340] INFO [Controller-3-to-broker-1-send-thread], Starting  (kafka.controller.RequestSendThread)
[2015-09-01 09:55:25,340] INFO [Replica state machine on controller 3]: Invoking state change to OnlineReplica for replicas [Topic=logstash,Partition=0,Replica=1] (kafka.controller.ReplicaStateMachine)
[2015-09-01 09:57:15,567] TRACE [Controller 3]: checking need to trigger partition rebalance (kafka.controller.KafkaController)
[2015-09-01 09:57:15,567] DEBUG [Controller 3]: preferred replicas by broker Map(5 -> Map([logstash,0] -> List(5, 1))) (kafka.controller.KafkaController)
[2015-09-01 09:57:15,567] DEBUG [Controller 3]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2015-09-01 09:57:15,568] TRACE [Controller 3]: leader imbalance ratio for broker 5 is 0.000000 (kafka.controller.KafkaController)
@joestein

This comment has been minimized.

Show comment
Hide comment
@joestein

joestein Sep 1, 2015

Contributor

It sounds like maybe your disks filled up? You can set/control that with setting --options on the command line for the kafka configuration changes. It could also be the --log4j-options that need adjusting per kafka default these are on trace and also may fill up disks.

The controller log isn't going to tell you what the failure was from the broker that failed, just that it failed.

You should look at all of the logs in the sandboxes that failed to see if you find any error in there for the reason by it failing.

Contributor

joestein commented Sep 1, 2015

It sounds like maybe your disks filled up? You can set/control that with setting --options on the command line for the kafka configuration changes. It could also be the --log4j-options that need adjusting per kafka default these are on trace and also may fill up disks.

The controller log isn't going to tell you what the failure was from the broker that failed, just that it failed.

You should look at all of the logs in the sandboxes that failed to see if you find any error in there for the reason by it failing.

@tpolekhin

This comment has been minimized.

Show comment
Hide comment
@tpolekhin

tpolekhin Sep 1, 2015

the disk space is clearly not the case, because i've set retention on a topic to 1GB
and i have plenty of free space on the node

logs from sandbox are in the first post

tpolekhin commented Sep 1, 2015

the disk space is clearly not the case, because i've set retention on a topic to 1GB
and i have plenty of free space on the node

logs from sandbox are in the first post

@joestein

This comment has been minimized.

Show comment
Hide comment
@joestein

joestein Sep 4, 2015

Contributor

In the first post you had issue with a broker crashing but posted the logs for the scheduler. Then you posted the controller logs. I think if you have the logs from the sandbox that the broker crashed from it would be helpful to see if anything in there can help point to the issue of why that broker crashed. You also might need to look at the slave logs in case it was a cgroup freeze or such too (meaning you went over your mem limits). Typically the controller log is not going to tell you why a broker crashed so that log while might be useful to see in these cases isn't going to typically help but in others sure.

Contributor

joestein commented Sep 4, 2015

In the first post you had issue with a broker crashing but posted the logs for the scheduler. Then you posted the controller logs. I think if you have the logs from the sandbox that the broker crashed from it would be helpful to see if anything in there can help point to the issue of why that broker crashed. You also might need to look at the slave logs in case it was a cgroup freeze or such too (meaning you went over your mem limits). Typically the controller log is not going to tell you why a broker crashed so that log while might be useful to see in these cases isn't going to typically help but in others sure.

@tpolekhin

This comment has been minimized.

Show comment
Hide comment
@tpolekhin

tpolekhin Sep 4, 2015

in the first post you can find scheduler.log and executor stderr logs. this is all the information i could find regarding failed broker. all other logs contain no useful info

tpolekhin commented Sep 4, 2015

in the first post you can find scheduler.log and executor stderr logs. this is all the information i could find regarding failed broker. all other logs contain no useful info

@AndriiOmelianenko

This comment has been minimized.

Show comment
Hide comment
@AndriiOmelianenko

AndriiOmelianenko Oct 5, 2015

@joestein @tymofii-polekhin
Hi guys, any updates here?

AndriiOmelianenko commented Oct 5, 2015

@joestein @tymofii-polekhin
Hi guys, any updates here?

@tpolekhin

This comment has been minimized.

Show comment
Hide comment
@tpolekhin

tpolekhin Oct 5, 2015

Looks like ZK issues

tpolekhin commented Oct 5, 2015

Looks like ZK issues

@tpolekhin tpolekhin closed this Oct 5, 2015

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 23, 2015

how to solve kafka spark direct approuch Error ???

15/11/23 11:53:06 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Exception in thread "main" org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at scala.util.Either.fold(Either.scala:97)
at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:422)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at Sparkstreaming1.DirectApprouch.Sparkdirectapprouch.main(Sparkdirectapprouch.java:45)

ghost commented Nov 23, 2015

how to solve kafka spark direct approuch Error ???

15/11/23 11:53:06 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Exception in thread "main" org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at scala.util.Either.fold(Either.scala:97)
at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:422)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at Sparkstreaming1.DirectApprouch.Sparkdirectapprouch.main(Sparkdirectapprouch.java:45)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment