Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sarama ConsumerGroup Update #239

Closed
frairon opened this issue Mar 19, 2020 · 5 comments
Closed

Sarama ConsumerGroup Update #239

frairon opened this issue Mar 19, 2020 · 5 comments

Comments

@frairon
Copy link
Contributor

frairon commented Mar 19, 2020

In this issue we want to announce the upcoming release of a new and majorly refactored version of Goka.
Spoiler: there will be breaking changes, but nothing that can't be fixed with simple text-replace in your projects. So no worries. We'll provide a small migration guide, but most of the things will be obvious anyway.

So what are the major changes:

  • Removed sarama-cluster, replaced with sarama's ConsumerGroup
  • remove package kafka, move remaining files to root
  • introduce go mod
  • Sorry, removed support for librdkafka

We'll tag the version shortly so it can be tested. Let us know if there are any issues so we can quickly react.

frairon added a commit that referenced this issue Mar 19, 2020
Co-authored-by: frairon <franz.eichhorn@lovoo.com>
Co-authored-by: R053NR07 <jan.bickel@lovoo.com>
frairon added a commit that referenced this issue Mar 19, 2020
Co-authored-by: frairon <franz.eichhorn@lovoo.com>
Co-authored-by: R053NR07 <jan.bickel@lovoo.com>
@mattburman
Copy link

mattburman commented Apr 14, 2020

Hi @frairon, I've been evaluating the latest beta changes and writing a wrapper around it for our own usage. Previously we had some logic around view.Run which implemented retry behaviour with an exponential backoff.

Re-testing this logic with the new versions shows that view.Run no longer returns like before and it's reconnecting with its own logic. That's great that the retries are within Goka now and we can remove some of our own logic.

I think I've narrowed it down to view.Run now looping over each partition and calling CatchupForever: v0.1.4...v0.9.0-beta1#diff-6202d9bde3a4cbd1c7a28821646249d4
(GitHub seems to have a bug with links with hashes so try v0.1.4...v0.9.0-beta1 with #diff-6202d9bde3a4cbd1c7a28821646249d4 at the end or just go to view.go)

This is great, but it would be good to have some kind of control over the retry behaviour. We previously has view.Run wrapped with a configurable backoff using https://github.com/jpillora/backoff. What are your thoughts on this? For now I am OK with removing our custom logic and keeping the default retry behaviour implemented in Goka. I would still welcome any thoughts on making the behaviour configurable, however.

Additionally, it would be nice to expose the view state: possibly ViewStateIdle, ViewStateCatchUp, or ViewStateRunning? Ideally I want to be able to differentiate between being behind and a connection cant currently be established.

I would suggest any migration guide includes information about this change to view.Run because it took me a while to realise our previous backoff behaviour was no longer running.

@mattburman
Copy link

mattburman commented Apr 15, 2020

I have also found the view.Recovered method returns true whilst it is struggling to connect to kafka. Is this intended? I need some way of determining whether the app is connected to kafka for metrics. It looks like partitionStats.Status also returns 4 (PartitionRunning) when no connection to kafka can be established. Stalled returns false too

@mattburman
Copy link

I have also found the view.Recovered method returns true whilst it is struggling to connect to kafka. Is this intended? I need some way of determining whether the app is connected to kafka for metrics. It looks like partitionStats.Status also returns 4 (PartitionRunning) when no connection to kafka can be established. Stalled returns false too

Looks like #246 might provide capability to track connection status outside of goka, but in a bit of a hacky way.

  • store a Connected boolean on the struct implementing goka.Backoff
  • append the struct to a list of all the backoffs in the implementation of BackoffBuilder
  • set to false when Duration() is called
  • set to true when Reset() is called

Then some status checker can possibly loop over all of these backoff structs and check their status independently.

@frairon
Copy link
Contributor Author

frairon commented Apr 15, 2020

@mattburman you're on fire here :D.
Let's sum up the issues (and actually create different issues for them).
Backoff-Behavior is discussed and handled in #247.
I also just created an issue for the view state/connection state #248 and one for the migration guide #249.

frairon added a commit that referenced this issue Jul 9, 2020
* Co-authored-by: frairon <franz.eichhorn@lovoo.com>
Co-authored-by: R053NR07 <jan.bickel@lovoo.com>
* bugfix in shutdown/rebalance: correctly closing joins
* run update/request/response stats in own goroutine
* fix rebalancing by adding a copartitioning rebalance strategy
* updated readme for configuration, added changelog
* Open Storage in PartitionTable when performing Setup
* return trackOutput if stats are nil
* view.get fixed for uninitialized view
added lots of godoc
fixed many linter errors
added Open call when creating storage
* context stats tracking: use queueing mechanism to avoid race conditions
* Add simpleBackoff and add backoff options for processor and view
* added strings to streams helper
* #249 view example
* issue #249: migration guide, #241 panic documentation of context
* #248 exposing partition table's connection state to view
* Migration: describe offsetbug
* partition_table: implement autoreconnect in recovery mode
* bugfix goroutine-leak in statsloop in partition table
* #248: refactored state merger, bugfix race condition when shutting down the signal/observers, created example
* bugfix partition state notification
* remove change in example, updated changelog
* fix readme example and add readme as example
* restore 1-simplest fix, remove readme-example
Co-authored-by: Jan Bickel <jan.bickel@lovoo.com>
@frairon
Copy link
Contributor Author

frairon commented Jul 10, 2020

The PR is merged, the version is released (v1.0.0). I'll close the issue, feel free to open new ones for possible bugs/new issues

@frairon frairon closed this as completed Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants