Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Tracking issue for peer blacklisting #520
What is wrong?
In working on #24 I have found that we need to first implement better blacklisting. Without this, we end up with a pool filled with lots of mediocre or bad peers. Loose evaluation looks like the following end up being problems.
How can it be fixed
I'm looking at doing the following.
First: New internal plugin that runs the blacklist tracker as a service over the event bus. This is needed because we need to communicate with this service from many different parts of the codebase and it's not really viable to pass around a reference to the tracker to all of these different code locations.
Then, I plan to implement some version of the following rules.
Immediate disconnect and put on blacklist for some appropriate amount of time.
Mark them as useless and blacklist for appropriate amount of time. Disconnect oldest/random useless peer if pool is full.
Poorly connected, Lazy
Use token bucket based approach. If token bucket is empty, disconnect and blacklist for appropriate amount of time.
Here, you're measuring something like timeouts per second? So they get some leeway to have some timeouts all in a row, but their average has to stay below some watermark. Interesting. The appropriate timeout rate might be something that is worth adapting to the pool average. Maybe you have a terrible network connection and it's your fault the timeouts are happening so we don't want to churn our peers in that case? Just a thought...
(note this has less to do with the existing blacklist records and more about the more detailed records about peers that are coming)
Thinking about database size, I went reaching for a mechanism to get the following two things.
Looking to achieve this by adding an
What's the rationale for the probabilistic expiration? No objection, just didn't follow.
It seems like a first draft that just expires everything after a month is okay, it gets something out the door that doesn't leak disk space too badly. (probably by tracking the last modified date instead of an expiration date in the DB, so that changing the expiration algorithm is trivial)
There are assumptions here so I'll measure before I run with anything.
The motivation is disk footprint and keeping the sqlite database at a reasonable size. 6 month expiration window seems like too long. A node that is online for a continuous 6 months seems like it could build up a very large database (need to measure). My thought is to handle the following two cases gracefully.
So I want to prune data from this database, but do it such that it thins the data out over time rather than a hard cuttoff to allow for a long tail of historical data that should still be able to facilitate the node re-connecting quickly.
Mechanism may sound complex but implementation should be as simple as a single small function like
I mostly wrote this down to remember it. I'm not convinced it is necessary but the mechanism seems simple enough and should behave in a manner that satisfies the two edge cases above nicely.
Ah, so maybe the goal is to thin out the old nodes, but only thin them out if we have enough nodes in the DB to connect to. Maybe as simple as a rule to only thin out if there are fewer than 1000 entries, or something like it.
My concern isn't that the code is difficult, but wanting to make it easier to answer the question "why am I not trying to reconnect to this node anymore."