network: Added ReachableHostPort() and a few helpers #6454

Merged
merged 5 commits into from Oct 18, 2016

Conversation

Projects
None yet
5 participants
Contributor

dimitern commented Oct 14, 2016

Based on PR #6426, but using a different, hopefuly simpler approach
with full test coverage. None of the added helpers are used yet, but
will be in a follow-up.

Prerequiste to fix http://pad.lv/1616098.

Contributor

dimitern commented Oct 14, 2016

Bogus job failure
!!build!!

Contributor

dimitern commented Oct 14, 2016

!!build!!

Contributor

dimitern commented Oct 14, 2016

!!build!!

Owner

nskaggs commented Oct 14, 2016

@dimitern I see the pre-check passed; what's wrong?

Contributor

dimitern commented Oct 14, 2016

@nskaggs it failed twice before that on lxc and once on trusty - with unrelated (build?) failures

network/hostport.go
+// UniqueHostPorts sends the unique entries in the given hostPorts slice on the
+// returned channel, closing the channel when done. It takes a stop channel,
+// which can be used to signal the sending loop to stop earlier.
+func UniqueHostPorts(stop <-chan struct{}, hostPorts []HostPort) <-chan HostPort {
@babbageclunk

babbageclunk Oct 17, 2016

Member

It doesn't seem like a host would ever have so many addresses that uniquifying them would need to be async (it'll be much much faster than a network hop anyway). This would be better just returning []HostPort and not using the goroutine.

@dimitern

dimitern Oct 17, 2016

Contributor

👍 Done - as a side-effect, I've removed DropDuplicatedHostPorts() as it's doing the same thing, only slightly less efficiently for a large input slice.

network/hostport.go
+// and clock are used to perform the dialing and measuring the time taken (which
+// is also set on error). Usually, a net.Dialer and clock.WallClock are passed
+// for dialer and clock, respectively.
+func DialHostPort(hostPort HostPort, dialer Dialer, clock clock.Clock) <-chan DialResult {
@babbageclunk

babbageclunk Oct 17, 2016

Member

As discussed below, this should take the destination channel as an argument.

@dimitern

dimitern Oct 17, 2016

Contributor

👍 Done

network/hostport.go
+
+ startTime := clock.Now()
+
+ conn, err := dialer.Dial("tcp", hostPort.NetAddr())
@babbageclunk

babbageclunk Oct 17, 2016

Member

Would it make sense to have a timeout on the Dial?

@dimitern

dimitern Oct 17, 2016

Contributor

net.Dialer has a Timeout field you can set - clarified that in the doc comment.

network/hostport.go
+
+ bestResult := DialResult{Duration: 24 * time.Hour}
+ for hostPort := range UniqueHostPorts(stop, hostPorts) {
+ lastResult := <-DialHostPort(hostPort, dialer, clock)
@babbageclunk

babbageclunk Oct 17, 2016

Member

This doesn't run the DialHostPorts in parallel - receiving from the channel immediately means that each one is done consecutively. You'd need to collect all of the channels up and select across them somehow. I think a better structure is to have one loop calling DialHostPort (passing in the same destination channel into each call), and then a second loop reading results from the channel.

@dimitern

dimitern Oct 17, 2016

Contributor

👍 Done, and that simplified the code and tests a lot :) Thanks!

It's great that there's less code. \o/

network/hostport.go
- bestResult = lastResult
- logger.Debugf("best endpoint %q has duration %s", bestResult.Endpoint, bestResult.Duration)
- }
+// FastestHostPort dials the unique entries in the given hostPorts, in parallel,
@frobware

frobware Oct 18, 2016

Contributor

"unique" is not needed here. It acts on whatever arguments you pass.

@dimitern

dimitern Oct 18, 2016

Contributor

👍 Removed

network/hostport.go
- bestResult = lastResult
- logger.Debugf("best endpoint %q has duration %s", bestResult.Endpoint, bestResult.Duration)
- }
+// FastestHostPort dials the unique entries in the given hostPorts, in parallel,
@frobware

frobware Oct 18, 2016

Contributor

s/FastestHostPort/ListeningEndpoints/ -- or something. Fastest doesn't convey that much? Fastest latency? Fastest for throughput?

@dimitern

dimitern Oct 18, 2016

Contributor

👍 As discussed on IRC, renamed to ReachableHostPort().

+
+ for _, hostPort := range uniqueHPs {
+ go func(hp HostPort) {
+ conn, err := dialer.Dial("tcp", hp.NetAddr())
@frobware

frobware Oct 18, 2016

Contributor

It's not immediately clear why we can Dial without a timeout. Is it because of the select/timeout further down? And if some probes take longer than others do these active go routines close all their connections?

@dimitern

dimitern Oct 18, 2016

Contributor

Because the dialer argument will be a net.Dialer, which already has a Timeout field you can set. This is explained in the doc comment. The final select is to ensure the global timeout argument is respected. Connections are always closed when established.

network/hostport.go
+ go func(hp HostPort) {
+ conn, err := dialer.Dial("tcp", hp.NetAddr())
+ if err == nil {
+ defer conn.Close()
@frobware

frobware Oct 18, 2016

Contributor

Just call close. Why defer? This is an OS resource. Best we could do is discard it immediately if we can.

@dimitern

dimitern Oct 18, 2016

Contributor

👍 Done

@dimitern dimitern changed the title from network: Added FastestHostPort and a few helpers to network: Added ReachableHostPort() and a few helpers Oct 18, 2016

Contributor

dimitern commented Oct 18, 2016

$$merge$$

Contributor

jujubot commented Oct 18, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 0294fd5 into juju:develop Oct 18, 2016

@dimitern dimitern deleted the dimitern:network-fastesthostport branch Oct 18, 2016

jujubot added a commit that referenced this pull request Oct 21, 2016

Merge pull request #6481 from dimitern/lp-1616098-ssh+scp-alladdresses
lp:1616098: ssh,scp,dh pick reachable address among all

Uses the new SSHClient API facade v2 (from #6468), if
available, and network.ReachableHostPort() (#6454) in
code shared between the ssh, scp, and debug-hooks CLI
commands, to select a reachable address from all known
addresses of the target (machine/unit/host).

Fixes http://pad.lv/1616098.

QA steps (live tested on both MAAS 1.9 with multiple-
NIC nodes and on lxd - see below):

  1. Using the tip of `develop`, bootstrap a lxd xenial controller.
  2. juju switch controller
  3. juju status 0    # note of the IP address (IP1)
  4. lxc exec \<juju-instance-id\> -- ip addr add \<IP2\>/24 dev eth0
(add an extra IP on machine-0, from the same subnet)
  5. lxc exec \<juju-instance-id\> -- systemctl restart jujud-machine-0.service
(bounce the controller agent to pick up the new IP)
  6. juju ssh 0             # should work
  7. sudo ip route add blackhole \<IP1\>/32
(make the preferred public IP unreachable from your laptop)
  8. juju --debug ssh 0     # should fail with 'Invalid argument'
  9. git checkout dimitern/lp-1616098-ssh+scp-alladdresses
(switch to this PR's branch and rebuild juju & jujud)
  10. juju ssh 0                    # should work now, using IP2
  11. juju scp 0:/etc/issue .  # as above
  12. juju deploy ubuntu --to 0
  13. juju debug-hooks ubuntu/0 # should work as well
(small delay ~10s observed on lxd, because of 'sudo')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment