Skip to content
This repository has been archived by the owner on Nov 15, 2019. It is now read-only.

Limit bootstrap peers #1106

Closed
povilasb opened this issue Jan 9, 2019 · 10 comments
Closed

Limit bootstrap peers #1106

povilasb opened this issue Jan 9, 2019 · 10 comments
Assignees
Labels

Comments

@povilasb
Copy link
Contributor

povilasb commented Jan 9, 2019

Currently Crust does 2 things we don't like:

  1. bootstrap cache has no limit. That can cause the system to behave unpredictibly: consume too much memory in RAM and/or disk, act slow where it's not expected, like loading the cache, etc.
  2. Service::bootstrap() will attempt all peers in parallel, even if there are thousands of cached peers. The problem with this is that many systems won't allow you to use more than ~1000 sockets in parallel.

So

  1. introduce configurable bootstrap cache limit with the reasonable defaults (say 10000 which roughly results in 40 kB cache). When bootstrap cache exceeds the limit, remove the contacts with smallest trust level - see bellow for description.
  2. Introduce a counter for bootstrap cache entries that logs how many times a peer was used to successfully bootstrap off. Then this counter will be used as a trust metric - the more we've bootstrap from the peer the more we trust it's availability.
  3. Limit parallel connections during bootstrap to prevent from exhausting system sockets.
@1uka
Copy link
Contributor

1uka commented Jan 13, 2019

Hey, I think I can take this up. About the trust counter, what I would do is change this hash set to a HashMap<PeerInfo, T> where T can be anything from an integer representing a counter to some more complicated trust metric (the time it's been live for example, or the instant when we last heard from that peer).

@povilasb
Copy link
Contributor Author

Great!

what I would do is change this hash set to a HashMap<PeerInfo, T>....

Sounds good to me, although personally I try to avoid generic code if I can, because it makes code more complex and verbose.

@ustulation
Copy link
Contributor

Yes let's design in smaller steps and introduce stuff when needed pls

@povilasb
Copy link
Contributor Author

@1uka are you still looking at working on this issue? :)

@1uka
Copy link
Contributor

1uka commented Jan 17, 2019

Yes yes, I've just been busy and have very little time these days. I will start working probably tonight or tomorrow.

@povilasb
Copy link
Contributor Author

Btw, some people from the office just told me that earlier the default for bootstrap cache limit was 1500. So go with that. Seems like Skype used the same constant back in the day when it still had peer-to-peer architecture :)

@1uka
Copy link
Contributor

1uka commented Jan 18, 2019

@povilasb Before I start, I have some questions.

Would you prefer if we saved the HashMap here, or just save the peers? The same applies here; should we send the map or just the set of peers?

And should I use time::Instant as a trust metric instead of a simple counter? That way, we will trust the most recent peers, meaning some peers that reached a high amount of trust in the past will be filtered if they become inactive, whilst if we use a counter they would be retried until the other peers surpass that counter.

PS: Considering my questions, I would not save the map if we use Instant as the value, and I would return only the recent peers.

@povilasb
Copy link
Contributor Author

Would you prefer if we saved the HashMap here, or just save the peers? The same applies here; should we send the map or just the set of peers?

I don't see the use of a HashMap there. Peers are stored in HashSet. Or am I missing something? ::)

I'd say keep the counter as a trust metric:

  1. if peer cached earlier won't be accessible anymore, it will be removed from the cache anyway;
  2. Number 1 ^ implies that all cached peers are accessible. Given that we trust more peers that we have reached hundred of times rather than the ones we just saw.

Mmm, I guess you meant to use HashMap to associate trust metric with PeerInfo. I'd say don't do it. Keep a list of PeerInfos ordered in a Vec or some other order retaining container. But keep pub fn peers() the same and return HashSet when asked. This way the Cache user does not care about the order, it only gets a unique set of cached peers. This order is only for bootstrap cache so that it could easily remove peers with least trust when needed.

@1uka
Copy link
Contributor

1uka commented Jan 24, 2019

Mmm, I guess you meant to use HashMap to associate trust metric with PeerInfo. I'd say don't do it. Keep a list of PeerInfos ordered in a Vec or some other order retaining container.

At first I thought associating the trust metric with PeerInfo is what you wanted. So now I am thinking about another solution (which would make keeping a trust metric unnecessary):

  • Use a VecDeque; we would always push new peers to the back, and if we try to put a peer that is already cached we move it to the back; now, when VecDeque is full we remove the first peer; the first peer in the queue will always be the one that was reached the least times. We even don't have to move the existing peer, we can just append the duplicate since peers() will be returning a set

But this way, once we reach the cache limit, the cache will always oscillate around full capacity.

Is this closer to what you would prefer? Or should I just change the HashMap value to u32 and use that as the metric instead of Instant, and when the cache is full we clean all the peers with the lowest trust?

Now I have one more question: say we use a counter as a trust metric, the cache is full and all the peers in it have been bootstrapped off at least 2 times; we want to insert a new peer in the cache which we haven't seen already.

  1. Number 1 ^ implies that all cached peers are accessible. Given that we trust more peers that we have reached hundred of times rather than the ones we just saw.

if we adhere to this, we would not insert the new peer, is that correct?

@povilasb povilasb self-assigned this Jan 28, 2019
@povilasb povilasb added the feat label Jan 28, 2019
@povilasb povilasb added this to To do in version 0.32.0 via automation Jan 28, 2019
@povilasb povilasb moved this from To do to In progress in version 0.32.0 Jan 28, 2019
version 0.32.0 automation moved this from In progress to Done Feb 13, 2019
@povilasb povilasb moved this from Done to Ready For QA in version 0.32.0 Feb 15, 2019
@S-Coyle S-Coyle assigned S-Coyle and unassigned povilasb Feb 15, 2019
@S-Coyle
Copy link
Contributor

S-Coyle commented Feb 15, 2019

Running some QA checks on this, keeping notes here https://gist.github.com/S-Coyle/91aeba06ea13bdbd386118d1e44fe1af

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
version 0.32.0
  
Ready For QA
Development

Successfully merging a pull request may close this issue.

4 participants