Skip to content
This repository has been archived by the owner on May 20, 2020. It is now read-only.

Integrate Cruise Control #35

Open
2 of 4 tasks
krallistic opened this issue Oct 6, 2017 · 2 comments
Open
2 of 4 tasks

Integrate Cruise Control #35

krallistic opened this issue Oct 6, 2017 · 2 comments

Comments

@krallistic
Copy link
Owner

krallistic commented Oct 6, 2017

Linkedin opensourced cruise-control (https://github.com/linkedin/cruise-control) a tool to rebalance kafka-cluster. Since they have much more experience running kafka cluster, their algorithms should be used.

As a first integration, the following steps has to be done

  • CC Dockerfile & Deployment
  • Kafka Docker file with CruiseControlMetricsReporter
  • Upsizing with CC rebalancing
  • Downsizing with cc rebalancing

Automatic Rebalancing of topics (if a skew occurs) is out of scope for the first integration

@ankon
Copy link

ankon commented Oct 26, 2017

Right now this seems to fail with confluentinc/cp-kafka:3.3.0:

[2017-10-26 15:22:28,039] INFO KafkaConfig values: 
	advertised.host.name = null
	advertised.listeners = PLAINTEXT://kafka-0.kafka.development.svc.cluster.local:9092
	advertised.port = null
	alter.config.policy.class.name = null
	authorizer.class.name = 
	auto.create.topics.enable = true
	auto.leader.rebalance.enable = true
	background.threads = 10
	broker.id = 0
	broker.id.generation.enable = true
	broker.rack = null
	compression.type = producer
	connections.max.idle.ms = 600000
	controlled.shutdown.enable = true
	controlled.shutdown.max.retries = 3
	controlled.shutdown.retry.backoff.ms = 5000
	controller.socket.timeout.ms = 30000
	create.topic.policy.class.name = null
	default.replication.factor = 1
	delete.records.purgatory.purge.interval.requests = 1
	delete.topic.enable = false
	fetch.purgatory.purge.interval.requests = 1000
	group.initial.rebalance.delay.ms = 3000
	group.max.session.timeout.ms = 300000
	group.min.session.timeout.ms = 6000
	host.name = 
	inter.broker.listener.name = null
	inter.broker.protocol.version = 0.11.0-IV2
	leader.imbalance.check.interval.seconds = 300
	leader.imbalance.per.broker.percentage = 10
	listener.security.protocol.map = SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,TRACE:TRACE,SASL_SSL:SASL_SSL,PLAINTEXT:PLAINTEXT
	listeners = PLAINTEXT://0.0.0.0:9092
	log.cleaner.backoff.ms = 15000
	log.cleaner.dedupe.buffer.size = 134217728
	log.cleaner.delete.retention.ms = 86400000
	log.cleaner.enable = true
	log.cleaner.io.buffer.load.factor = 0.9
	log.cleaner.io.buffer.size = 524288
	log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
	log.cleaner.min.cleanable.ratio = 0.5
	log.cleaner.min.compaction.lag.ms = 0
	log.cleaner.threads = 1
	log.cleanup.policy = [delete]
	log.dir = /tmp/kafka-logs
	log.dirs = /var/lib/kafka/data
	log.flush.interval.messages = 9223372036854775807
	log.flush.interval.ms = null
	log.flush.offset.checkpoint.interval.ms = 60000
	log.flush.scheduler.interval.ms = 9223372036854775807
	log.flush.start.offset.checkpoint.interval.ms = 60000
	log.index.interval.bytes = 4096
	log.index.size.max.bytes = 10485760
	log.message.format.version = 0.11.0-IV2
	log.message.timestamp.difference.max.ms = 9223372036854775807
	log.message.timestamp.type = CreateTime
	log.preallocate = false
	log.retention.bytes = -1
	log.retention.check.interval.ms = 300000
	log.retention.hours = 168
	log.retention.minutes = null
	log.retention.ms = null
	log.roll.hours = 168
	log.roll.jitter.hours = 0
	log.roll.jitter.ms = null
	log.roll.ms = null
	log.segment.bytes = 1073741824
	log.segment.delete.delay.ms = 60000
	max.connections.per.ip = 2147483647
	max.connections.per.ip.overrides = 
	message.max.bytes = 1000012
	metric.reporters = [com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter]
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	min.insync.replicas = 1
	num.io.threads = 8
	num.network.threads = 3
	num.partitions = 1
	num.recovery.threads.per.data.dir = 1
	num.replica.fetchers = 1
	offset.metadata.max.bytes = 4096
	offsets.commit.required.acks = -1
	offsets.commit.timeout.ms = 5000
	offsets.load.buffer.size = 5242880
	offsets.retention.check.interval.ms = 600000
	offsets.retention.minutes = 1440
	offsets.topic.compression.codec = 0
	offsets.topic.num.partitions = 50
	offsets.topic.replication.factor = 3
	offsets.topic.segment.bytes = 104857600
	port = 9092
	principal.builder.class = class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
	producer.purgatory.purge.interval.requests = 1000
	queued.max.requests = 500
	quota.consumer.default = 9223372036854775807
	quota.producer.default = 9223372036854775807
	quota.window.num = 11
	quota.window.size.seconds = 1
	replica.fetch.backoff.ms = 1000
	replica.fetch.max.bytes = 1048576
	replica.fetch.min.bytes = 1
	replica.fetch.response.max.bytes = 10485760
	replica.fetch.wait.max.ms = 500
	replica.high.watermark.checkpoint.interval.ms = 5000
	replica.lag.time.max.ms = 10000
	replica.socket.receive.buffer.bytes = 65536
	replica.socket.timeout.ms = 30000
	replication.quota.window.num = 11
	replication.quota.window.size.seconds = 1
	request.timeout.ms = 30000
	reserved.broker.max.id = 1000
	sasl.enabled.mechanisms = [GSSAPI]
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.principal.to.local.rules = [DEFAULT]
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.mechanism.inter.broker.protocol = GSSAPI
	security.inter.broker.protocol = PLAINTEXT
	socket.receive.buffer.bytes = 102400
	socket.request.max.bytes = 104857600
	socket.send.buffer.bytes = 102400
	ssl.cipher.suites = null
	ssl.client.auth = none
	ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
	ssl.endpoint.identification.algorithm = null
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLS
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS
	transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
	transaction.max.timeout.ms = 900000
	transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
	transaction.state.log.load.buffer.size = 5242880
	transaction.state.log.min.isr = 2
	transaction.state.log.num.partitions = 50
	transaction.state.log.replication.factor = 3
	transaction.state.log.segment.bytes = 104857600
	transactional.id.expiration.ms = 604800000
	unclean.leader.election.enable = false
	zookeeper.connect = zookeeper.development.svc.cluster.local
	zookeeper.connection.timeout.ms = null
	zookeeper.session.timeout.ms = 6000
	zookeeper.set.acl = false
	zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2017-10-26 15:22:28,105] WARN The support metrics collection feature ("Metrics") of Proactive Support is disabled. (io.confluent.support.metrics.SupportedServerStartable)
[2017-10-26 15:22:28,106] INFO starting (kafka.server.KafkaServer)
[2017-10-26 15:22:28,107] INFO Connecting to zookeeper on zookeeper.development.svc.cluster.local (kafka.server.KafkaServer)
[2017-10-26 15:22:28,119] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2017-10-26 15:22:28,123] INFO Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:host.name=kafka-0.kafka.development.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.vendor=Azul Systems, Inc. (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.home=/usr/lib/jvm/zulu-8-amd64/jre (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,123] INFO Client environment:java.class.path=:/usr/bin/../share/java/kafka/connect-file-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/kafka-streams-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/hk2-locator-2.5.0-b05.jar:/usr/bin/../share/java/kafka/kafka-streams-examples-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-test.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-javadoc.jar:/usr/bin/../share/java/kafka/connect-api-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/argparse4j-0.7.0.jar:/usr/bin/../share/java/kafka/support-metrics-common-3.3.0.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-core-2.24.jar:/usr/bin/../share/java/kafka/commons-lang3-3.1.jar:/usr/bin/../share/java/kafka/guava-20.0.jar:/usr/bin/../share/java/kafka/zookeeper-3.4.10.jar:/usr/bin/../share/java/kafka/kafka.jar:/usr/bin/../share/java/kafka/jetty-servlet-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/commons-validator-1.4.1.jar:/usr/bin/../share/java/kafka/commons-codec-1.9.jar:/usr/bin/../share/java/kafka/httpclient-4.5.2.jar:/usr/bin/../share/java/kafka/scala-parser-combinators_2.11-1.0.4.jar:/usr/bin/../share/java/kafka/jackson-databind-2.8.5.jar:/usr/bin/../share/java/kafka/jersey-guava-2.24.jar:/usr/bin/../share/java/kafka/commons-lang3-3.5.jar:/usr/bin/../share/java/kafka/commons-compress-1.8.1.jar:/usr/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka-log4j-appender-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/jersey-common-2.24.jar:/usr/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/jetty-servlets-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/hk2-utils-2.5.0-b05.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-base-2.8.5.jar:/usr/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.8.5.jar:/usr/bin/../share/java/kafka/jackson-annotations-2.8.5.jar:/usr/bin/../share/java/kafka/aopalliance-repackaged-2.5.0-b05.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-test-sources.jar:/usr/bin/../share/java/kafka/connect-json-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/usr/bin/../share/java/kafka/rocksdbjni-5.0.1.jar:/usr/bin/../share/java/kafka/commons-digester-1.8.1.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-scaladoc.jar:/usr/bin/../share/java/kafka/metrics-core-2.2.0.jar:/usr/bin/../share/java/kafka/jetty-server-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/log4j-1.2.17.jar:/usr/bin/../share/java/kafka/jersey-media-jaxb-2.24.jar:/usr/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/usr/bin/../share/java/kafka/javax.inject-1.jar:/usr/bin/../share/java/kafka/lz4-1.3.0.jar:/usr/bin/../share/java/kafka/hk2-api-2.5.0-b05.jar:/usr/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/usr/bin/../share/java/kafka/jopt-simple-5.0.3.jar:/usr/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/usr/bin/../share/java/kafka/javax.servlet-api-3.1.0.jar:/usr/bin/../share/java/kafka/plexus-utils-3.0.24.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.8.5.jar:/usr/bin/../share/java/kafka/maven-artifact-3.5.0.jar:/usr/bin/../share/java/kafka/validation-api-1.1.0.Final.jar:/usr/bin/../share/java/kafka/connect-transforms-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.11.0.0-cp1-sources.jar:/usr/bin/../share/java/kafka/commons-logging-1.2.jar:/usr/bin/../share/java/kafka/commons-collections-3.2.1.jar:/usr/bin/../share/java/kafka/javax.inject-2.5.0-b05.jar:/usr/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka-clients-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/paranamer-2.7.jar:/usr/bin/../share/java/kafka/jetty-security-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/usr/bin/../share/java/kafka/zkclient-0.10.jar:/usr/bin/../share/java/kafka/javassist-3.21.0-GA.jar:/usr/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/usr/bin/../share/java/kafka/httpmime-4.5.2.jar:/usr/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/connect-runtime-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/slf4j-log4j12-1.7.25.jar:/usr/bin/../share/java/kafka/slf4j-api-1.7.25.jar:/usr/bin/../share/java/kafka/scala-library-2.11.11.jar:/usr/bin/../share/java/kafka/kafka-tools-0.11.0.0-cp1.jar:/usr/bin/../share/java/kafka/avro-1.8.2.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-2.24.jar:/usr/bin/../share/java/kafka/xz-1.5.jar:/usr/bin/../share/java/kafka/javax.ws.rs-api-2.0.1.jar:/usr/bin/../share/java/kafka/support-metrics-client-3.3.0.jar:/usr/bin/../share/java/kafka/jackson-core-2.8.5.jar:/usr/bin/../share/java/kafka/jersey-server-2.24.jar:/usr/bin/../share/java/kafka/reflections-0.9.11.jar:/usr/bin/../share/java/kafka/httpcore-4.4.4.jar:/usr/bin/../share/java/kafka/jersey-client-2.24.jar:/usr/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:os.version=4.7.2 (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,124] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,125] INFO Initiating client connection, connectString=zookeeper.development.svc.cluster.local sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@3427b02d (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,137] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
[2017-10-26 15:22:28,143] INFO Opening socket connection to server 172.17.0.6/172.17.0.6:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,206] INFO Socket connection established to 172.17.0.6/172.17.0.6:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,215] INFO Session establishment complete on server 172.17.0.6/172.17.0.6:2181, sessionid = 0x15f584510a5000f, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,217] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2017-10-26 15:22:28,318] INFO Cluster ID = nWDeBixwReaENQBQy5o5oA (kafka.server.KafkaServer)
[2017-10-26 15:22:28,327] WARN No meta.properties file under dir /var/lib/kafka/data/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2017-10-26 15:22:28,339] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.KafkaException: com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter ClassNotFoundException exception occurred
	at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:288)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
	at io.confluent.support.metrics.SupportedServerStartable.startup(SupportedServerStartable.java:102)
	at io.confluent.support.metrics.SupportedKafka.main(SupportedKafka.java:49)
Caused by: java.lang.ClassNotFoundException: com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.kafka.common.utils.Utils.newInstance(Utils.java:300)
	at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:286)
	... 3 more
[2017-10-26 15:22:28,343] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
[2017-10-26 15:22:28,348] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2017-10-26 15:22:28,350] INFO Session: 0x15f584510a5000f closed (org.apache.zookeeper.ZooKeeper)
[2017-10-26 15:22:28,352] INFO EventThread shut down for session: 0x15f584510a5000f (org.apache.zookeeper.ClientCnxn)
[2017-10-26 15:22:28,352] INFO [Kafka Server 0], shut down completed (kafka.server.KafkaServer)
[2017-10-26 15:22:28,368] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)

@krallistic
Copy link
Owner Author

HEAD is currently pretty unstable through that refactoring. (With cruise-control unfortunately there will be custom images, based on the confluence one).
A stable Version is the latest tagged version (0.2.0)

krallistic added a commit that referenced this issue Nov 3, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants