query all hosts from one dnsseed if the seed configured multi-host #61

betachen · 2016-10-09T08:13:39Z

the dns-name query maybe hits a unreachable host, if only query once.

coveralls · 2016-10-09T08:30:49Z

Coverage decreased (-0.005%) to 1.843% when pulling e159e7f on betachen:master into b5aebbb on libbitcoin:master.

betachen · 2016-10-10T03:06:43Z

@evoskuil

compiled error on MacOS, I found:

Makefile.am:3: error: Libtool library used but 'LIBTOOL' is undefined
Makefile.am:3: The usual way to define 'LIBTOOL' is to add 'LT_INIT'
Makefile.am:3: to 'configure.ac' and run 'aclocal' and 'autoconf' again.
Makefile.am:3: If 'LT_INIT' is in 'configure.ac', make sure
Makefile.am:3: its definition is in aclocal's search path.

evoskuil · 2016-10-10T05:27:47Z

At some point our Travis OSX builds just started failing for insufficient libtool version, which I assume was caused a platform change by Travis, since our builds hadn't changed materially. We updated the build generation to uninstall and install libtool on the OSX builds, which resolved the issue. Then a couple of days later this error started up.

I'm traveling and haven't had time to look at it. If you can successfully patch configuration I will apply to update via 'libbitcoin-build' (or you can do so). 'LT_INIT' is configured so presumably there is an issue with the search path.

evoskuil

This is a great fix, sorry I didn't realize that this is what you were referring to in your initial question on the subject. Thanks for submitting.

evoskuil · 2016-10-10T06:05:59Z

src/utility/connector.cpp

-
-    safe_connect(iterator, socket, timer, handle_connect);
+    auto do_connecting = [&](boost::asio::ip::tcp::resolver::iterator iter){
+		const auto timeout = settings_.connect_timeout();


Style: use 4 spaces per indent (vs. tabs).

Style: avoid abbreviations (iter), balance curly braces.

Style: use existing type alias (asio::iterator).

evoskuil · 2016-10-10T06:14:17Z

src/utility/connector.cpp

+		safe_connect(iter, socket, timer, handle_connect);
+    };
+
+	//get all hosts under one seed


Generalize comments ("DNS record" vs. "seed"), since this affects all outbound network connections, not just seeding.

Style: use sentences (leading space, leading cap, period) in comments, e.g.:

// Get all hosts under one DNS record.

It would also be nice if you squashed the two commits to clarify history.

evoskuil · 2016-10-10T06:20:02Z

src/utility/connector.cpp

-            shared_from_this(), _1, socket, handle_connect));
-
-    safe_connect(iterator, socket, timer, handle_connect);
+    auto do_connecting = [&](boost::asio::ip::tcp::resolver::iterator iter){


Limit scope of closure by passing this in this case vs. &.

Hi, already fixed. using "[this, &handler]".

coveralls · 2016-10-10T07:51:09Z

Coverage decreased (-0.005%) to 1.843% when pulling e2b8cb3 on betachen:master into b5aebbb on libbitcoin:master.

coveralls · 2016-10-10T07:59:59Z

Coverage decreased (-0.005%) to 1.843% when pulling a9d7bb9 on betachen:master into b5aebbb on libbitcoin:master.

coveralls · 2016-10-10T08:12:14Z

Coverage decreased (-0.005%) to 1.843% when pulling f959ab9 on betachen:master into b5aebbb on libbitcoin:master.

coveralls · 2016-10-10T08:25:10Z

Coverage decreased (-0.005%) to 1.843% when pulling f959ab9 on betachen:master into b5aebbb on libbitcoin:master.

coveralls · 2016-10-10T09:00:26Z

Coverage decreased (-0.005%) to 1.843% when pulling e8ecd1b on betachen:master into b5aebbb on libbitcoin:master.

betachen · 2016-10-10T09:21:14Z

rising a new issue:

17:06:52.730210 INFO [network] Failure contacting seed [seed.bitnodes.io:8333] unable to reach remote host
17:06:52.730339 ERROR [network] Error seeding host addresses: operation failed
17:06:52.730460 ERROR [node] Node failed to start with error, operation failed.
17:06:52.730606 INFO [node] Please wait while the node is stopping...

confused~ however, it's passed when I tested it yesterday.

By the way, why does libbitcoin have to start listening 8333 after connecting seeds?
I remember that they are in unrelated different threads in bitcoin-core.

evoskuil · 2016-10-10T09:26:33Z

src/utility/connector.cpp

+    // Get all hosts under one DNS record.
+    for (asio::iterator end; iterator != end; ++iterator)
+    {  
+        do_connecting(iterator);


This will continue to loop following the first connection, yet the completion handler will have already been called. This extends the thread execution unnecessarily after connect. I reccomend converting safe connect and this closure to bool and using success as a second loop termination condition.

evoskuil · 2016-10-10T11:27:14Z

confused~ however, it's passed when I tested it yesterday.

Lack of network connectivity would do it.

By the way, why does libbitcoin have to start listening 8333 after connecting seeds?

It starts listening (if configured for inbound connections) following start (sync completion in node) so that it can receive inbound connections.

I remember that they are in unrelated different threads in bitcoin-core.

Libbitcoin design isn't based on the satoshi client. The listen call is asynchronous, there would be no reason to dedicate a thread to its invocation.

betachen · 2016-10-11T06:09:59Z

I have tested agin.

in case: the first dns-seed reach normally =>Error seeding host addresses: operation failed
in case: the first dns-seed cannot reach =>startup normally.
which let io_service still in same period, so other connecting returned in time, right?

I've fixed this issue already with "std::promise" and "std::future", is that ok?

coveralls · 2016-10-11T07:10:25Z

Coverage decreased (-0.008%) to 1.84% when pulling 215088e on betachen:master into b5aebbb on libbitcoin:master.

coveralls · 2016-10-11T07:23:41Z

Coverage decreased (-0.008%) to 1.84% when pulling 215088e on betachen:master into b5aebbb on libbitcoin:master.

evoskuil · 2016-10-11T13:27:46Z

HI @betachen, thanks for your work on this. As an aside, I should have pointed out the style guide previously. Two things I noticed in the changes are the use of raw pointers and line length.

Also, my comments above on loop termination are incorrect (the async connect cannot return a result). Sorry for the misdirection, I was working from my phone and didn't have the bigger picture.

Use of std::promise should be avoided in this case. It produces the intended sequence of operations, but in doing so it ties up a thread. This line: promise.get_future().get(); blocks while while the async_connect(...) call awaits a response. Basically it turns asynchronous execution in to synchronous execution. The proactor pattern used in libbitcoin networking can be quite confusing until one gets familiar with it.

I don't entirely follow your latest description, but it seems from your patch that you desire to execute the connection attempts sequentially vs. concurrently. Despite the sequential implementation, the implementation above will result in the completion handler being invoked more than once, which will produce catastrophic results.

The synchronize(handler, 1, NAME, mode); line invokes the completion handler upon its first (1) invocation, successful or otherwise. This exists to manage the race between the connection timeout timer and the (single) asynchronous connection attempt. Placing this into a loop repeats the same operation for each DNS record, resulting in one call to the completion handler for each DNS record. Yet after the first call to the completion handler it is presumed to be out of scope.

I assume that the failure you were encountering results from the fact that when any DNS record fails (or times out) the completion handler is invoked with an error code. This is then in a race with any successful completion, as both have invoked the completion handler. Assuming the handler remains in scope this will produce unintended results for the caller, which is relying on the contract of exactly one invocation of the handler.

If the DNS records can be contacted in parallel, which I assume is possible, then the proper implementation would be to use two synchronizers, one for the race with the set of connection attempts against the timer and the other for the race between the set of connection attempts (against each other). The latter synchronizer would have be configured to terminate on first success or set exhaustion, and the timer synchronizer would have to terminate on timer firing or other synchronizer completion. Then you would simply loop over the connection attempts as you did in the earlier pull request.

However, parallel connection to a single physical host via it's multiple DNS records is probably not so friendly, assuming more than one of the records is reachable. Connecting sequentially requires a different design (the completion handler must be forwarded), and raises the question of what is the proper timeout. A long list of DNS host records could significantly delay connection, so I would not reset the timeout for each one.

evoskuil · 2016-10-11T13:29:30Z

Also, it looks like reverting the libtool reinstall has resolved the Travis OSX build breaks. I assume that Travis discovered the original problem and fixed it, at which point our patch started breaking things.

evoskuil · 2016-10-11T14:40:20Z

Looking more closely at the scenario, it seems that this iteration is incorrect. According to boost documentation of async_connect:

This function attempts to connect a socket to one of a sequence of endpoints. It does this by repeated calls to the socket's async_connect member function, once for each endpoint in the sequence, until a connection is successfully established.

The "sequence" is provided to the via iterator. Passing an incremented iterator in a loop as above will result in one call to the first host, two calls to the second, three calls to the third, and so on.

betachen · 2016-10-11T16:15:03Z

Hi @evoskuil , thanks for your answer and patience.
I am reading style_guide, the src of libbitcoin is pretty and tidy for me, it's useful.

Actually I have got catastrophic results on this commit.
It seems normally startup, however, node is always interrupted by the next completion event or time out event.
secondly, as you said, asynchronous execution in to synchronous execution, sync headers slows down in synchronous way, extremely slow.

The finally solution:

If the DNS records can be contacted in parallel, which I assume is possible, then the proper implementation would be to use two synchronizers, one for the race with the set of connection attempts against the timer and the other for the race between the set of connection attempts (against each other). The latter synchronizer would have be configured to terminate on first success or set exhaustion, and the timer synchronizer would have to terminate on timer firing or other synchronizer completion. Then you would simply loop over the connection attempts as you did in the earlier pull request.

I'm afraid I have to digest it for a while firstly.

However, parallel connection to a single physical host via it's multiple DNS records is probably not so friendly, assuming more than one of the records is reachable. Connecting sequentially requires a different design (the completion handler must be forwarded), and raises the question of what is the proper timeout. A long list of DNS host records could significantly delay connection, so I would not reset the timeout for each one.

Yes, in earlier commit, "hosts.cache" has cached 1,000 hosts. however, about 50% hosts is unreachable in China. This will slow the procedure of searching available hosts.
That is, If we got too much hosts cache, we have to re-design timeout. right?

evoskuil · 2016-10-11T18:33:20Z

@betachen The current configurability should allow you to compensate for unreachable hosts. The default is based on the (non-China) experience of about 1 in 5 hosts being reachable (which targets a 20% success rate vs. your 50%). With 50% you would set connect_batch_size = 2 (2 concurrent attempts for every desired connection). This would allow you to get one connection per attempt on average.

You could set this number very high depending on circumstance. Your machine probably doesn't have 500 threads, but the asynchronous connection process spends most of its time waiting on the network, so this works just fine even with a low timeout. Extending the logical thread count via config can also help (by allowing more attempts to be started concurrently):

[network]
# The number of threads in the network threadpool, defaults to 50.
threads = 100
# The number of concurrent attempts to establish one connection, defaults to 5.
connect_batch_size = 500

Of course if connection establishment takes more time (which is probably the case for many of your peers) then increasing various timeouts can help:

[network]
# The time limit for connection establishment, defaults to 5.
connect_timeout_seconds = 5
# The time limit to complete the connection handshake, defaults to 30.
channel_handshake_seconds = 30
# The maximum time limit for obtaining seed addresses, defaults to 30.
channel_germination_seconds = 30

Interestingly, if you have low timeouts you tend to get faster peers when you do connect. Also with larger batches you also tend to get faster peers. In both cases faster peers with the race.

I wouldn't mess with this one unless you have a problem getting blocks during initial block download:

[node]
# The time limit for block receipt during initial block download, defaults to 5.
block_timeout_seconds = 5

evoskuil · 2016-10-11T18:41:30Z

I am reading style_guide, the src of libbitcoin is pretty and tidy for me, it's useful.

Thanks, we work hard on this because it is a development library and the source is meant to educate.

evoskuil · 2016-10-11T20:19:16Z

@betachen I'm closing this PR because the boost documentation makes it clear that iteration over the DNS records within the iterator is handled internal to async_connect(...). Please continue the thread in a new issue or on the IRC or mailing list if you have more questions.

evoskuil approved these changes Oct 10, 2016

View reviewed changes

evoskuil reviewed Oct 10, 2016

View reviewed changes

Get all hosts under one DNS-record if the record configured multi-hosts

215088e

evoskuil closed this Oct 11, 2016

query all hosts from one dnsseed if the seed configured multi-host #61

query all hosts from one dnsseed if the seed configured multi-host #61

Uh oh!

Conversation

betachen commented Oct 9, 2016

Uh oh!

coveralls commented Oct 9, 2016

Uh oh!

betachen commented Oct 10, 2016

Uh oh!

evoskuil commented Oct 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evoskuil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evoskuil Oct 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Oct 10, 2016

Uh oh!

coveralls commented Oct 10, 2016

Uh oh!

coveralls commented Oct 10, 2016

Uh oh!

coveralls commented Oct 10, 2016

Uh oh!

coveralls commented Oct 10, 2016

Uh oh!

betachen commented Oct 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evoskuil commented Oct 10, 2016

Uh oh!

betachen commented Oct 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Oct 11, 2016

Uh oh!

coveralls commented Oct 11, 2016

Uh oh!

evoskuil commented Oct 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evoskuil commented Oct 11, 2016

Uh oh!

evoskuil commented Oct 11, 2016

Uh oh!

betachen commented Oct 11, 2016

Uh oh!

evoskuil commented Oct 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evoskuil commented Oct 11, 2016

Uh oh!

evoskuil commented Oct 11, 2016

Uh oh!

Uh oh!

evoskuil commented Oct 10, 2016 •

edited

Loading

evoskuil Oct 10, 2016 •

edited

Loading

betachen commented Oct 11, 2016 •

edited

Loading

evoskuil commented Oct 11, 2016 •

edited

Loading

evoskuil commented Oct 11, 2016 •

edited

Loading