Display estimation of total number of nodes #1353
Conversation
In RT print routine, compute average of section length multiplied by 2 ^ section prefix length. This gives an estimation of the total number of nodes in the network. Display also the potential deviation given by max and min values.
|
Thanks for the pull request, and welcome! The MaidSafe team is excited to review your changes, and you should hear from @afck (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see CONTRIBUTOR.md for more information. |
|
Hi @Thierry61, thanks for the pull request! I'll be reviewing your changes tomorrow morning (Australian time). |
|
For information, here are the results on a local test network. I added nodes by group of ten until it reached 81 nodes. Then I removed nodes by group of ten also.
Note that:
|
|
Hi again @Thierry61, I've been thinking about network size estimation and I have a few observations:
For these reasons, I think it would be better if we omitted the "max" (upper bound) estimate, and maybe the "min" as it is currently calculated. If you're really keen, you could implement my lower bound idea from above, but I certainly won't block merging this PR on that (we can do it later). Given that, I think a message like this would be good: Or, if we implement the lower bound: Code-wise, I'd just like to request a few small changes (sorry this is turning into an essay). The main one is just that it would be better for the network estimation logic to be located outside of |
| // Result is exact when the whole xor space is covered by sections in our RT | ||
| // and is the sum of the sections size | ||
| let exact = map.iter() | ||
| .fold(0usize, |sum, (_, map)| sum + map.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use .sum() here
| let count_vec = map.iter() | ||
| .map(|(pfx, map)| map.len() * (1 << pfx.bit_count())) | ||
| .collect::<Vec<usize>>(); | ||
| let average = count_vec.iter().fold(0usize, |sum, i| sum + *i) / count_vec.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another spot we could use .sum()
|
I live in the opposite side of the planet (in France) so there are some delays in our communication due to time zone difference. Besides I don’t have a lot time to work on this (a few hours in the evening and on weekends). So, don’t expect an immediate delivery. Extracting the code from node.rs is good idea, but I think that putting it in an independent file would be an even better one because I think it would useful to display 2 kinds of messages depending on whether the RT covers the whole XOR space: or Later the second message could be completed like you suggested with the lower bound. I will wait for your approval and then try to deliver the minimal network_size.rs the following evening (without regression tests and lower bounds). For the rest, we will see later. |
|
A minimal definition of The size estimate function can take a We're planning to do some more stuff with metrics and stats soon, so it's alright to keep this change small. Thanks again 😄 |
|
I would prefer to keep it as simple as possible and I do think that it can be somewhat improved and simplified at the same time, and would make sense as a method of I suggest just using |
|
OK for a method of RoutingTable. OK for removing ± (and that was already agreed with Michael) But And be aware that the estimate formula can give the wrong result, even though we are in the condition to compute the exact value. Also the code is very short, it would a pity to not exploit it. So what about something explicit like: (here {RT_size} == {exact}) And (here {RT_size} < {estimate}, I am confident on this, but I have no proof) I replaced vault by node? And I added total? All this replacing “Routing Table size: {:3}”? on one line instead of 2 like I did? |
|
I don't think my proposal can give the wrong number: If the prefixes cover the whole name space, the divisor is 1 and the dividend is the exact number of nodes in the network. And I wouldn't print the formula in the logs, so it wouldn't be confusing users. |
|
Consider a network with 40 nodes and 3 sections: 0, 10, 11. The RT contains these 3 sections and covers the whole network. There are many cases where the estimated formula doesn’t return the exact value. Here are 2 examples, with realistic values (10 and 11 having about the same number of nodes and, 0 having about twice as much as each of them):
And 41 exact nodes in these 3 sections with 38 estimated nodes is a real case I came across in a past test (but I didn’t write down the nodes distribution). Frankly, the extra code is worth to add, and it’s only a few lines. |
|
Sorry, there's probably a misunderstanding. The formula I meant would apply as follows: In the first example, the network size equals the routing table size plus one, which is The same applies to the second example, except that the network size is |
|
Just a clarification, in my examples 40 was the total number of nodes, so routing table size was 39 but your formula still applies: (39 + 1) / 1 = 40. (41 total nodes was a real network test but the table with 40 total nodes was made with Excel) And you’re right, there was a misunderstanding: I didn’t understand that you proposed another formula. Now I understand it, but I would like a confirmation that it gives better results. Don’t talk about simplicity, I consider mine very simple, so it is important to implement the best formula. Concerning displayed information how about Label “Known nodes in the network” is only a proposal, so please, indicate to me the label to display. |
|
I am thinking, that I don’t have enough time, now and the next few days and I don’t want to slow you down, so please go ahead and reimplement it the way you want. |
|
No worries @Thierry61, I've implemented @afck's formula in #1358 |
|
Sorry for dragging out that discussion! But Michael implemented it already anyway, so I'll close this PR. |
In RT print routine, compute average of section length multiplied by 2 ^ section prefix length.
This gives an estimation of the total number of nodes in the network.
Display also the potential deviation given by max and min values.
This implements this idea proposed in this post in safe forum.