Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka: Add broker-level metrics-collecting filter #8188

Merged
merged 59 commits into from
Jan 6, 2020

Conversation

adamkotwasinski
Copy link
Contributor

@adamkotwasinski adamkotwasinski commented Sep 9, 2019

Description: Simple Kafka filter for broker. Grabs and decodes messages and updates metrics.
If message could not be recognised (the protocol used by client<->broker comms is higher than what's present in Envoy), we just pass through the payloads (IMHO it would be bad to have Envoy impact communication in this particular filter - this is not going to be possible in "mesh filter" as we we'd need).

In detail the change includes:

  • kafka broker filter object & related config object,
  • configurability: metrics prefix (if someone ever decides to use single envoy instance for multiple brokers),
  • generated metrics lists (because we have metrics per message type type),
  • minor refactoring in response codec (expected response list kept in a separate object to reduce coupling); also a stack has been replaced with a map to make envoy less-intrusive wrt protocol ordering issues,
  • messaging_utilities test code library that's capable of making example payloads of various kinds (is going to be used for further tests).

Relates to #2852
Risk Level: Low (Kafka code is unused right now)
Testing: automated tests, manual testing with real Kafka broker and client
Docs Changes: Kafka broker filter added
Release Notes: n/a

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8188 was synchronize by adamkotwasinski.

see: more, trace.

Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
…eption handling in broker-level filter

Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
… tests

Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
…sts and use these utils instead

Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
@adamkotwasinski
Copy link
Contributor Author

@mattklein123 I have changed the test to be manual + flaky for now.
The readme file in test directory encourages any future committer to re-run the test (just to catch something weird).

Given it's an integration test and it might run differently depending on machine speed, virtualization, other factors I can't predict today => WDYT about making it manual for now, and me working on making it become part of normal (automated) build in future.
A way to achieve it might be me basically sending a few emails on the mailing list for volunteers to run the manual test, report any issues if seen (hopefully zero), and if the scenario succeeds after a few tries => making the proper PR with removal of manual part.

Now waiting for the build to go green.

/wait

@mattklein123
Copy link
Member

WDYT about making it manual for now, and me working on making it become part of normal (automated) build in future.

Sure sounds good. Very excited for this to land and to figure out what we can build on top.

Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
@adamkotwasinski
Copy link
Contributor Author

/wait (will re-request review when I finally get it to green)

…me during int tests

Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
@adamkotwasinski
Copy link
Contributor Author

/wait-any

@adamkotwasinski
Copy link
Contributor Author

Ready for re-review.
The integration test is now fully parallelizable (basically 4 random ports are picked at startup, and retried if something complains) => there will be further steps to make automated in future.
Some v2->v3 proto migration happened in master, but I think I got everything working (basically copied the structure from zookeeper filter).

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely epic work. Let's ship an iterate!

@repokitteh-read-only repokitteh-read-only bot removed the api label Jan 6, 2020
@mattklein123 mattklein123 merged commit a60f685 into envoyproxy:master Jan 6, 2020
Comment on lines +307 to +309
sha256 = "ae7a1696c0a0302b43c5b21e515c37e6ecd365941f68a510a7e442eebddf39a1", # 2.2.0-rc2
strip_prefix = "kafka-2.2.0-rc2/clients/src/main/resources/common/message",
urls = ["https://github.com/apache/kafka/archive/2.2.0-rc2.zip"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on shipping this @adamkotwasinski. I suspect this version lines up with when you started the PR. Kafka is now at 2.4.0 - https://github.com/apache/kafka/releases/tag/2.4.0. Are there any blockers on moving from the old RC release to the latest stable release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moderation there should be none - I'll take a look as soon as I can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moderation work will be in #9582 ; I will need some more time to get the 2.4 running (the descriptor files used to generate the C++ code changed a little)

Comment on lines +313 to +314
strip_prefix = "kafka_2.12-2.2.0",
urls = ["http://us.mirrors.quenda.co/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants