Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burrow 1.0 - refresh metadata on leader failures #268

Merged

Conversation

toddpalino
Copy link
Contributor

There's a case where we can get failures getting leaders for partitions. This adds the same RefreshMetadata call that is done later in the offset fetch process to the leader checks.

@toddpalino toddpalino merged commit 16dfb09 into linkedin:burrow-1.0-RC Nov 13, 2017
@toddpalino toddpalino deleted the burrow-1.0-deletion-fix branch November 14, 2017 21:15
toddpalino added a commit that referenced this pull request Dec 1, 2017
* Merge burrow-1.0 RC branch

* Burrow 1.0 Release Candidate (#258)

* Replace burrow with the proposed 1.0 framework
Look, it's essentially a complete rewrite. There's almost nothing left of the original code here, and none of the modules have been fleshed out yet.

The overall changes:
* Make burrow itself a lib wrapped with main, so we can wrap it inside other applications
* Move to a modular framework with well-defined interfaces between components
* Switch logging to uber/zap and lumberjack
* Start with being able to have parallel operation (notifier active eveywhere) so we can share load between instances

* Restructure a bit to resolve import cycles

* Make sure to gitignore the built binary

* Move modules to internal packages

* Tweak logging to work on windows

* Clean up coordinators a little more

* Fix syscalls for unix vs windows

* First pass at inmemory storage module

* tests for inmemory, and fixes found during testing

* Additional tests to make sure channels are closed after replies

* Actually start the mainLoop

* Assure only 1 storage module is allow, and add coordinator tests

* Fix storage code and tests for problems found while testing evaluators

* Add a fixture for storage to create a coordinator with storage module for testing code outside storage

* Fixes to evaluator code based on testing

* Tests for the evaluator coordinator and caching module

* Add a fixture for the evaluator that other testing can use

* Add start/stop and multiple request tests for the evaluator coordinator

* Remove extra parens

* Fix config name

* Add group whitelists to storage module, along with tests

* Fix a potential bug in min-distance where we would never create a new offset

* moar logging

* Add a group delete request for storage modules

* Added expiration of group data via lazy deletion on request

* First pass at cluster module for kafka with limited tests

* Add a shim interface for sarama.Client and sarama.Broker

* Switch kafka cluster module to use the shim interface for sarama

* Add tests for the rest of the kafka cluster module

* Add a storage request for setting partition owner for a group

* Add kafka_client consumer module and tests

* Add consumer coordinator tests

* Move the storage request send helper to a new file

* Refactor names for the sarama shims

* Add a shim for go-zookeeper so we'll be able to test

* Implement the kafkazk consumer module and tests

* Add tests for validation routines

* comment fix

* Add tests for helpers

* Add whitelist support to consumers

* Have the PID creator also check if the process exists before exiting

* Restructure main ZK as a coordinator to use the common interface

* Start notifiers, clean up some testing

* Add tests for HTTP notifier module

* Refactor notifier coordinator to move common logic out of the modules

* Refactor notifier whitelist and threshold accept logic to coordinator

* Move template execution up to a coordinator method for consistency

* Email notifier

* Slack notifier and tests

* Use asserts instead of panics for the HTTP tests

* Fix a case in the storage fixture where it won't get all the commits

* Check http notifier profile configs

* Make maxlag template helper use the CurrentLag field

* Rename NotifierModule to just Module

* Rename StorageModule to just Module

* Rename EvaluatorModule to just Module

* Add support for ZK locks, as well as tests

* Add a ticker that can be stopped and restarted

* Make the notifier coordinator use a ZK lock with the restartable ticker

* Add HTTP server and tests

* Update dependencies

* Clean up HTTP tests so we test the router configuration code

* Few more HTTP server tests, and flesh out log level set/get

* Reorder imports

* Fix copyright comments

* Formatting cleanup

* Set httprouter to master, since it hasn't released in 2 years

* touch up logging

* Remember to set the config as valid

* Use master branch of testify

* Updates found in testing

* Check for null fields in member metadata

* Fixes to metadata handling

* Add a worker pool for inmemory to consistently process groups

* Remove the kafka_client mainLoop, as it's not useful

* Fix formatting and a duplicate logging field

* Add support for CORS headers on the HTTP server

* Add a template helper for formatting timestamps using normal Time format strings

* Add support for basic auth in the HTTP notifier

* Refactor config to use viper instead of gcfg

* add more logging in Kafka clients, and fix config loading

* fix typo in client-id config string

* Catch errors when starting coordinators

* Log the http listener info

* Clean up some of the logging

* Fix logging and notifiers from testing (#259)

* Fix notifier logic in 1.0 (#261)

* Fix how the extras field is pulled into the HTTP response structs

* Make sure the module accept group is always called

* Pause before testing stop on the storage coordinator

* Fix conditions where notifications are sent, and add a much more robust test

* 1.0 - Add jitter to notifier evaluations (#263)

* Change the loop for evaluations to be started to be timed with jitter for each consumer group

* reorder imports

* Burrow 1.0 config defaults (#264)

* Add owners to consumer group status response

* If no storage module configured, use a default

* If no evaluator module configured, use a default

* Fix default http server

* ConfigurationValid gets set by Start, not before

* cleanup methods that don't need to be exported

* Burrow 1.0 group blacklist (#266)

* Add group blacklists

* Reduce logging level for storage purging expired groups

* Start evaluator and httpserver before clusters/consumers

* Remove the requirement that you must have a cluster and consumer module defined

* Explicitly update metadata for topics that had errors fetching offsets (#267)

* Refresh metadata on leader failures as well (#268)

* Make sure that whenever we are reading the cluster map in the notifier, we have a lock (#269)

* Burrow 1.0 - No negative lag (#271)

* Lag values should always be unsigned ints

* unnecessary cast

* Update deps

* Start notifier before clusters and consumers (#272)

* Remove slack notifier (#278)

* Remove slack notifier
* Add example slack templates

* Burrow 1.0 - Godocs for everything (#281)

* Godoc docs for everything, and resolve all golint issues

* Burrow 1.0 - Doc cleanup (#282)

* Update example configuration files

* Fix example email template

* Update docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant