-
Notifications
You must be signed in to change notification settings - Fork 58
query all hosts from one dnsseed if the seed configured multi-host #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
compiled error on MacOS, I found:
|
At some point our Travis OSX builds just started failing for insufficient libtool version, which I assume was caused a platform change by Travis, since our builds hadn't changed materially. We updated the build generation to uninstall and install libtool on the OSX builds, which resolved the issue. Then a couple of days later this error started up. I'm traveling and haven't had time to look at it. If you can successfully patch configuration I will apply to update via 'libbitcoin-build' (or you can do so). 'LT_INIT' is configured so presumably there is an issue with the search path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great fix, sorry I didn't realize that this is what you were referring to in your initial question on the subject. Thanks for submitting.
src/utility/connector.cpp
Outdated
|
||
safe_connect(iterator, socket, timer, handle_connect); | ||
auto do_connecting = [&](boost::asio::ip::tcp::resolver::iterator iter){ | ||
const auto timeout = settings_.connect_timeout(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: use 4 spaces per indent (vs. tabs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: avoid abbreviations (iter
), balance curly braces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: use existing type alias (asio::iterator
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all fixed.
src/utility/connector.cpp
Outdated
safe_connect(iter, socket, timer, handle_connect); | ||
}; | ||
|
||
//get all hosts under one seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generalize comments ("DNS record" vs. "seed"), since this affects all outbound network connections, not just seeding.
Style: use sentences (leading space, leading cap, period) in comments, e.g.:
// Get all hosts under one DNS record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be nice if you squashed the two commits to clarify history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fixed.
src/utility/connector.cpp
Outdated
shared_from_this(), _1, socket, handle_connect)); | ||
|
||
safe_connect(iterator, socket, timer, handle_connect); | ||
auto do_connecting = [&](boost::asio::ip::tcp::resolver::iterator iter){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit scope of closure by passing this
in this case vs. &
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, already fixed. using "[this, &handler]".
rising a new issue:
confused~ however, it's passed when I tested it yesterday. By the way, why does libbitcoin have to start listening 8333 after connecting seeds? |
src/utility/connector.cpp
Outdated
// Get all hosts under one DNS record. | ||
for (asio::iterator end; iterator != end; ++iterator) | ||
{ | ||
do_connecting(iterator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will continue to loop following the first connection, yet the completion handler will have already been called. This extends the thread execution unnecessarily after connect. I reccomend converting safe connect and this closure to bool and using success as a second loop termination condition.
Lack of network connectivity would do it.
It starts listening (if configured for inbound connections) following start (sync completion in node) so that it can receive inbound connections.
Libbitcoin design isn't based on the satoshi client. The listen call is asynchronous, there would be no reason to dedicate a thread to its invocation. |
I have tested agin.
I've fixed this issue already with "std::promise" and "std::future", is that ok? |
HI @betachen, thanks for your work on this. As an aside, I should have pointed out the style guide previously. Two things I noticed in the changes are the use of raw pointers and line length. Also, my comments above on loop termination are incorrect (the async connect cannot return a result). Sorry for the misdirection, I was working from my phone and didn't have the bigger picture. Use of I don't entirely follow your latest description, but it seems from your patch that you desire to execute the connection attempts sequentially vs. concurrently. Despite the sequential implementation, the implementation above will result in the completion handler being invoked more than once, which will produce catastrophic results. The I assume that the failure you were encountering results from the fact that when any DNS record fails (or times out) the completion handler is invoked with an error code. This is then in a race with any successful completion, as both have invoked the completion handler. Assuming the handler remains in scope this will produce unintended results for the caller, which is relying on the contract of exactly one invocation of the handler. If the DNS records can be contacted in parallel, which I assume is possible, then the proper implementation would be to use two synchronizers, one for the race with the set of connection attempts against the timer and the other for the race between the set of connection attempts (against each other). The latter synchronizer would have be configured to terminate on first success or set exhaustion, and the timer synchronizer would have to terminate on timer firing or other synchronizer completion. Then you would simply loop over the connection attempts as you did in the earlier pull request. However, parallel connection to a single physical host via it's multiple DNS records is probably not so friendly, assuming more than one of the records is reachable. Connecting sequentially requires a different design (the completion handler must be forwarded), and raises the question of what is the proper timeout. A long list of DNS host records could significantly delay connection, so I would not reset the timeout for each one. |
Also, it looks like reverting the libtool reinstall has resolved the Travis OSX build breaks. I assume that Travis discovered the original problem and fixed it, at which point our patch started breaking things. |
Looking more closely at the scenario, it seems that this iteration is incorrect. According to boost documentation of async_connect:
The "sequence" is provided to the via |
Hi @evoskuil , thanks for your answer and patience. Actually I have got catastrophic results on this commit. The finally solution:
I'm afraid I have to digest it for a while firstly.
Yes, in earlier commit, "hosts.cache" has cached 1,000 hosts. however, about 50% hosts is unreachable in China. This will slow the procedure of searching available hosts. |
@betachen The current configurability should allow you to compensate for unreachable hosts. The default is based on the (non-China) experience of about 1 in 5 hosts being reachable (which targets a 20% success rate vs. your 50%). With 50% you would set You could set this number very high depending on circumstance. Your machine probably doesn't have 500 threads, but the asynchronous connection process spends most of its time waiting on the network, so this works just fine even with a low timeout. Extending the logical thread count via config can also help (by allowing more attempts to be started concurrently): [network]
# The number of threads in the network threadpool, defaults to 50.
threads = 100
# The number of concurrent attempts to establish one connection, defaults to 5.
connect_batch_size = 500 Of course if connection establishment takes more time (which is probably the case for many of your peers) then increasing various timeouts can help: [network]
# The time limit for connection establishment, defaults to 5.
connect_timeout_seconds = 5
# The time limit to complete the connection handshake, defaults to 30.
channel_handshake_seconds = 30
# The maximum time limit for obtaining seed addresses, defaults to 30.
channel_germination_seconds = 30 Interestingly, if you have low timeouts you tend to get faster peers when you do connect. Also with larger batches you also tend to get faster peers. In both cases faster peers with the race. I wouldn't mess with this one unless you have a problem getting blocks during initial block download: [node]
# The time limit for block receipt during initial block download, defaults to 5.
block_timeout_seconds = 5 |
Thanks, we work hard on this because it is a development library and the source is meant to educate. |
@betachen I'm closing this PR because the boost documentation makes it clear that iteration over the DNS records within the iterator is handled internal to |
the dns-name query maybe hits a unreachable host, if only query once.