Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a groups reaper to remove non existing groups #657

Merged
merged 9 commits into from
Feb 24, 2021

Conversation

d1egoaz
Copy link
Contributor

@d1egoaz d1egoaz commented Sep 11, 2020

burrow currently keeps reporting lag for non-existing consumers.

The only way to remove groups from burrow automatically is configuring
expire-group, which is not ideal as it can conflict with consumer with
no members.

This PR introduces a go routine to get the existing consumer groups from
Kafka, and compare it against burrow consumers to reap the non-existing ones.

🎩

╰─○ curl localhost:8000/v3/kafka/local/consumer
{"error":false,"message":"consumer list returned","consumers":["burrow-local"],"request":{"url":"/v3/kafka/local/consumer","host":"48fa52eee460"}}% 

add a consumer group:

╰─○ kafkacat -Cb localhost:9092 -G diego test-topic
% Waiting for group rebalance
% Group diego rebalanced (memberid rdkafka-ca2754f1-25f5-481e-b6c0-defd28513caf): assigned: test-topic [0], test-topic

check after some seconds burrow:

╰─○ curl localhost:8000/v3/kafka/local/consumer
{"error":false,"message":"consumer list returned","consumers":["diego","burrow-local"],"request":{"url":"/v3/kafka/local/consumer","host":"48fa52eee460"}}%                                                                                   
╭─[16:04:52] diegoalvarez@d1egoaz-MBP/ ~/src/github.com/Shopify/burrow

╰─○ curl localhost:8000/v3/kafka/local/consumer/diego
{"error":false,"message":"consumer detail returned","topics":{"test-topic":[{"offsets":[null,null,null,null,null,null,null,{"offset":4,"timestamp":1599865466521,"lag":0},{"offset":5,"timestamp":1599865486526,"lag":0},{"offset":9,"timestamp":1599865491523,"lag":0}],"owner":"","client_id":"","current-lag":0},{"offsets":[null,null,null,null,null,null,null,null,null,{"offset":5,"timestamp":1599865466521,"lag":0}],"owner":"","client_id":"","current-lag":0}]},"request":{"url":"/v3/kafka/local/consumer/diego","host":"48fa52eee460"}}% 

stop consumer and delete consumer group or wait until the consumer offsets retention expires:

╰─○ kafka-consumer-groups --delete --group diego --bootstrap-server localhost:9092
Deletion of requested consumer groups ('diego') was successful.

kafka reports the deleted consumer group

kafka_1      | [2020-09-11 23:05:38,101] INFO [GroupCoordinator 1001]: The following groups were deleted: diego. A total of 2 offsets were removed. (kafka.coordinator.group.GroupCoordinator)

then the groups reaper deletes this consumer:

burrow_1     | {"level":"info","ts":1599865553.5991886,"msg":"groups reaper: removing non existing kafka consumer group (diego) from burrow","type":"module","coordinator":"cluster","class":"kafka","name":"local"}

burrow stops reporting lag as the group is removed:

╰─○ curl localhost:8000/v3/kafka/local/consumer/diego
{"error":true,"message":"cluster or consumer not found","request":{"url":"/v3/kafka/local/consumer/diego","host":"48fa52eee460"}}%                                                                                                            
╭─[16:05:55] diegoalvarez@d1egoaz-MBP/ ~/src/github.com/Shopify/burrow

╰─○ curl localhost:8000/v3/kafka/local/consumer
{"error":false,"message":"consumer list returned","consumers":["burrow-local"],"request":{"url":"/v3/kafka/local/consumer","host":"48fa52eee460"}}% 

it also fixes:
#589

burrow currently keeps reporting lag for non existing consumers.

The only way to remove groups from burrow automatically is configuring
expire-group, which is not ideal as it can conflict with consumer with
no members.

This PR introduces a go routine to get the existing consumer groups from
Kafka, and compare it against burrow consumers to reap the non existing ones.
@d1egoaz d1egoaz requested a review from bai as a code owner September 11, 2020 23:39
calling `.Stop` on a ticker will not close the channel, so consumers
won't get triggered
Copy link

@andrewjamesbrown andrewjamesbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -98,6 +103,19 @@ func (module *KafkaCluster) Start() error {
// Start main loop that has a timer for offset and topic fetches
module.offsetTicker = time.NewTicker(time.Duration(module.offsetRefresh) * time.Second)
module.metadataTicker = time.NewTicker(time.Duration(module.topicRefresh) * time.Second)

if module.groupsReaperRefresh != 0 {
module.groupsReaperTicker = time.NewTicker(time.Duration(module.groupsReaperRefresh) * time.Second)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupsReaperRefreshInSeconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the current code conventions, offsetRefresh and topicRefresh don't have seconds as part of the naming


if module.groupsReaperRefresh != 0 {
module.groupsReaperTicker = time.NewTicker(time.Duration(module.groupsReaperRefresh) * time.Second)
if !module.saramaConfig.Version.IsAtLeast(sarama.V0_11_0_0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a variable like sarama.MIN_VERSION. I feel we may forget to update this value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to update this value ever; this only checks that the current client can use the new API added in the protocol v.0.11.0

module.groupsReaperTicker = time.NewTicker(time.Duration(module.groupsReaperRefresh) * time.Second)
if !module.saramaConfig.Version.IsAtLeast(sarama.V0_11_0_0) {
module.groupsReaperTicker.Stop()
module.Log.Warn("groups reaper disabled, it needs at least kafka v0.11.0.0 to get the list of consumer groups")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel purge or cleanup would sound more intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least for me, reaper feels more intuitive, I've seen this word used widely in the Kafka and java code bases:
image


req := &protocol.StorageRequest{
RequestType: protocol.StorageFetchConsumers,
Reply: make(chan interface{}),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no type here? interface{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the type of Reply, we cannot use a concrete type here

// TODO: find how to get reportedConsumerGroup from KafkaClient
burrowIgnoreGroupName := "burrow-" + module.name
burrowGroups, _ := res.([]string)
for _, g := range burrowGroups {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single letter variable name is hard to read.

g -> burrowGroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a single letter, easy to read :D

variable is local and scope is limited, there is not need to use longer variables names, this is usually the go practice

Cluster: module.name,
Group: g,
}
helpers.TimeoutSendStorageRequest(module.App.StorageChannel, request, 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it processed in another routine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it'll eventually be processed by the storage channel reader

assert.Equalf(t, protocol.StorageFetchConsumers, request.RequestType, "Expected request sent with type StorageFetchConsumers, not %v", request.RequestType)
assert.Equalf(t, "test", request.Cluster, "Expected request sent with cluster test, not %v", request.Cluster)

request.Reply <- []string{"group1", "group2"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel you have more than one test in the same test method. Split 🙏

@@ -205,6 +205,9 @@ type SaramaClient interface {
// NewConsumerFromClient creates a new consumer using the given client. It is still necessary to call Close() on the
// underlying client when shutting down this consumer.
NewConsumerFromClient() (sarama.Consumer, error)

// List the consumer groups available in the cluster.
ListConsumerGroups() (map[string]string, error)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the value type of the map is string?

Copy link
Contributor Author

@d1egoaz d1egoaz Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something related the return type of Sarama, as it returns a Map with the consumer group and the consumer group type; here, we only care about if the consumer group exists, a Set would be great, but go doesn't have it, so we still need a Map, but we don't care about the value, so I could convert the type to map[string]struct{}, I decided not to do it as there is no performance gains, and it'll be more wasteful to create another map just to get rid of the undesired value

Copy link

@maxboisvert maxboisvert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking comments but please address some of it 😸

@oleh-poberezhets
Copy link

@bai How about this feature?

@d1egoaz
Copy link
Contributor Author

d1egoaz commented Feb 23, 2021

@bai could you please 👀

@bai
Copy link
Collaborator

bai commented Feb 24, 2021

Apologies for delay, this fell through the cracks on my side. PR LGTM.

@bai bai merged commit 153b1f6 into linkedin:master Feb 24, 2021
@d1egoaz d1egoaz deleted the diego_groups-reaper branch February 24, 2021 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants