Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Kafka (to 0.10) #1405

Closed
codefromthecrypt opened this issue Nov 16, 2016 · 18 comments · Fixed by #1586
Closed

Update Kafka (to 0.10) #1405

codefromthecrypt opened this issue Nov 16, 2016 · 18 comments · Fixed by #1586

Comments

@codefromthecrypt
Copy link
Member

Last year, we attempted to update Kafka to 0.9, but had to revert it due to the ordering problem implied.

#904

Unfortunately no one responded to the email asking about kafka update plans: https://groups.google.com/forum/#!topic/zipkin-user/rQpecbvc6Y0

We shouldn't be pinned to 0.8 forever, and we are now getting requests for 0.10 support. I'm wondering how we should address Kafka updates, particularly as the server includes an all-jar. Unless packaging is different, it might imply weird tricks to be able support multiple versions in the same binary, or a separate build.

Can anyone share their current status wrt kafka and/or have ideas on how to support multiple versions simply?

cc @prat0318 @dgarson @SimenB @sveisvei @eirslett @kristofa @NegatioN @shakuzen

@codefromthecrypt
Copy link
Member Author

ps according to confluent, it is still fine to use kafka 0.8 consumers and producers http://docs.confluent.io/3.0.0/upgrade.html

Since zipkin-server is standalone, it might not need to change. However, zipkin-reporter-java might want a kafka010 module since it shares the classpath with production apps (who might want to upgrade)

@StephenWithPH
Copy link

Thoughts on revisiting this? Using the 0.8 consumer brings with it a Zookeeper dependency; newer versions of the consumer library use Kafka itself for offset tracking.

#904 sticks with 0.8.x clients until we either make a standalone 0.9 kafka collector or re-evaluate when 0.9 adoption is higher... it's hard to gauge the pace of Kafka broker upgrades in the world at large. Anecdotally, half of the room at the last Kafka meetup was still on 0.8.

The standalone zipkin-kafka-connector approach sounds viable; it's similar to https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html.

We've just started digging deep into this part of Zipkin to brainstorm how to carve out the pinned 0.8 consumer. If the above sounds viable, we'll come back with a more detailed sketch.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 4, 2017 via email

@codefromthecrypt
Copy link
Member Author

@StephenWithPH Sorry, you said "newer versions of the consumer library use Kafka itself for offset tracking" (still curious about custom build or not as that impacts the direction here)

Can you elaborate on this? Is this an automatic feature or would we need to expose configuration for it. Code example would be best.

Also, are you saying that now brokers self-discover? so you pass a broker url and that's it?

I'm also interested in progressing this, but need to pick your brain a bit. I think we'll need to do both 0.8 and 0.10, so this is the tricky part.

@codefromthecrypt
Copy link
Member Author

pps are you using docker? If so, this gives us some more options on how to address this..

@StephenWithPH
Copy link

I'll answer the questions before diving into the details:

  • we are using Docker in prod

  • we use the factory openzipkin Docker image ❤️

  • newer versions of the Kafka consumer library write to a consumer offsets topic in Kafka itself. That's automatic... I don't believe there's any way not to do that, and no user code is needed. This Confluent blog post gives lots of details; note that the post is somewhat dated... this behavior is no longer in beta.

  • newer consumers do indeed discover all the brokers; see bootstrap.servers here.

From a high level, we strongly prefer to protect our Zookeeper from all of our various Kafka consumers. That's why we want to avoid indirectly using the old style consumer via Zipkin. Quoting some docs: ZooKeeper does not scale extremely well (especially for writes) when there are a large number of offsets (i.e., consumer-count * partition-count).

I agree that maintaining support for 0.8.x is necessary so we don't break existing users whose brokers are 0.8.x. We'd like to add support for the 0.10.x consumer (since our brokers are 0.10.x). The consumer-to-broker version limitation is still a thing, but sounds marginally less painful with 0.10.x.

It's likely a matter of time until 0.9.x support is a feature request from other Zipkin users. This is what lead us to link to the Apache Flink manner of handling "pluggable" Kafka consumer (or producer) client versions.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 4, 2017 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 4, 2017 via email

@StephenWithPH
Copy link

We do have cycles. We'll be back on this early next week.

@dgrabows
Copy link
Contributor

dgrabows commented May 8, 2017

I've been working on a Kafka 0.10 collector to see what would be involved in implementing one, primarily because the cluster I need Zipkin to consume span data from requires TLS. This isn't supported by the 0.8 consumer.

I have followed the pattern laid out out by the Kafka 0.8 collector. There is a kafka10 collector module and corresponding auto-configuration module. The zipkin-server module has been modified to run the collector when KAFKA10_BOOTSTRAP_SERVERS is set. It is functional at this point, although there is some cleanup to do in the collector and I'm suspect including the kafka10 collector in zipkin-server breaks the 0.8 collector. This is because the 0.10.2.0 version of kafka-clients takes precedence over the 0.8.x.x kafka-clients when resolving dependencies for zipkin-server. I'm not sure what the best way to address that classpath conflict is.

The work I've done so far is on the kafka10-collector branch of my fork. There is some documentation on how to configure the collector on https://github.com/dgrabows/zipkin/tree/kafka10-collector/zipkin-server#kafka-010-collector.

I'm happy to do what I can to contribute this back or help use this as the basis for other efforts.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 8, 2017 via email

@dgrabows
Copy link
Contributor

dgrabows commented May 8, 2017

I will change the environment variables to KAFKA_* instead of KAFKA10_*.

What about the underlying properties in https://github.com/dgrabows/zipkin/blob/kafka10-collector/zipkin-server/src/main/resources/zipkin-server-shared.yml? Right now, there are separate sets of properties for each Kafka collector, under the zipkin.collector.kafka and zipkin.collector.kafka10 namespaces. I'm leaning towards re-using the zipkin.collector.kafka properties for the kafka10 collector. Most of the properties are applicable to both collectors, and kafka10.consumer-threads could be dropped in favor of kafka.streams. My only lingering concern is that some properties, e.g. max-message-size, will only apply to one version of the Kafka collector and that may lead to confusion. I think that can be addressed sufficiently through documentation.

Thoughts?

@NithinMadhavanpillai
Copy link

NithinMadhavanpillai commented May 8, 2017

Hi @adriancole ,
@StephenWithPH and I were looking at building from scratch a new Kafka 10 collector implementation based on your inputs. Today found that @dgrabows has covered a lot more on this and we would like to help you two in moving this forward. Could you please let us know if we can help you folks getting this to the master.

@dgrabows
Copy link
Contributor

dgrabows commented May 8, 2017

@NithinMadhavanpillai A couple of things that I would find helpful:

  • Feedback on the implementation of https://github.com/dgrabows/zipkin/tree/kafka10-collector/zipkin-collector/kafka10. I have a couple of todo comments in there that I intend to circle back to. Is there anything else about how this has been implemented that concerns you, isn't a good fit for your expected usage, or you think could be improved?
  • Thoughts on how to integrate this collector into zipkin-server without breaking the current 0.8 Kafka collector. The approach I've taken mostly likely breaks the 0.8 collector because of dependency resolution conflicts on different versions of kafka-clients. In the context of a docker deployment, Adrian mentioned checking for the Kafka collector environment variables and adjusting the classpath based on that. It sounds like that might work for you, because you're running Zipkin using docker. I will not be deploying it using docker. I could do the same thing with a wrapper shell script and I'd be fine with that solution. I'm not sure how well that generalizes to a solution that can be contributed back to the zipkin project since shells scripts imply implementation and testing across multiple platforms (linux, windows, etc.).

@dgrabows
Copy link
Contributor

dgrabows commented May 8, 2017

Another option for packaging the Kafka 0.10 collector would be to not integrate it into zipkin-server, and package it as a standalone collector instead. For example, use the same approach as the AWS SQS collector. That would be fine for my purposes, as I plan to have the collector running on a separate host from the query api for other reasons. I don't know if that's a desirable approach when considering ease-of-use for people using Zipkin and Kafka 0.10+ in general.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 8, 2017 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 8, 2017 via email

@dgrabows
Copy link
Contributor

dgrabows commented May 9, 2017

That all makes sense. I created #1586 based on what I've got so far. I tried to capture the outstanding work I'm aware of on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants