Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlacklistingChannel probation and per-endpoint info #349

Merged
merged 6 commits into from
Feb 18, 2020

Conversation

iamdanfox
Copy link
Contributor

@iamdanfox iamdanfox commented Feb 17, 2020

Before this PR

  • Blacklisting was an all or nothing thing (so on endpoint could stop traffic to an entire host)
  • After blacklisting, we'd send a deluge of requests to the newly unblacklisted host, which might still be broken, resulting in lots of failures.
  • Also 500s wouldn't result in blacklisting

(It's way easier to review this when I have the :simulation graphs to demonstrate what's broken)

After this PR

==COMMIT_MSG==
BlacklistingChannel keeps track of per-endpoint information and conservatively ramps up.
==COMMIT_MSG==

Possible downsides?

@changelog-app
Copy link

changelog-app bot commented Feb 17, 2020

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

BlacklistingChannel keeps track of per-endpoint information and conservatively ramps up.

Check the box to generate changelog(s)

  • Generate changelog entry

this.duration = duration;
this.ticker = ticker;
this.perEndpointBlacklistState =
Caffeine.newBuilder().maximumSize(1000).ticker(ticker).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per-endpoint state feels a bit odd here. At the channel level I think we'd want to track health of the channel as a whole. I wonder if we could invert the relationship between channel and Endpoint to allow endpoints to be wrapped similarly to conjure-undertow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that you want to nail down before moving forwards with this PR? (We've already got a Cache<Endpoint, Limiter> in our concurrency limiters...)

This BlacklistingChannel is a key ingredient for an alternative node selection strategy I've been working on, but I'd really like to land this PR as it's not super usable as is.

}

// I wish java had union types
interface BlacklistState {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derive4j is your friend here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I actually really don't want to have the extra faff and tooling setup of derive4j - can you forgive these two purely implementation-detail classes?

@ferozco
Copy link
Contributor

ferozco commented Feb 18, 2020

👍

@iamdanfox
Copy link
Contributor Author

FWIW Channels#create doesn't currently use BlacklistingChannel - it was a piece of floating dead code... but maybe not for long ;)

@bulldozer-bot bulldozer-bot bot merged commit 96ebe45 into develop Feb 18, 2020
@bulldozer-bot bulldozer-bot bot deleted the dfox/better-blacklisting branch February 18, 2020 23:11
@carterkozak
Copy link
Contributor

carterkozak commented Feb 18, 2020 via email

@iamdanfox
Copy link
Contributor Author

Ah right fair enough, as the concurrency limiter's behaviour only changes when a request goes in or comes back (which triggers the queued thing to schedule), whereas since blacklisting is time based, we could have a spike of requests that all get queued and if no further requests come in then they'd just sit on the queue forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants